LESSWRONG
LW

Quick Takes

I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.

Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core "advantages" of the smartphone were the "ability to present timely information" (i.e. interrupt/distract you) and always being on hand. Also it was small, so anyth... (read more)

Showing 3 of 30 replies (Click to show all)

Random Developer1m10

I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.

A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.

But I don't talk to the damn things. That feels increasingly weird and unwise.

1Purplehermann6d

I mostly use it for syntax, and formatting/modifying docs, giving me quick visual designs...

4Adam Zerner7d

My main concern with heavy LLM usage is what Paul Graham discusses in Writes and Write-Nots. His argument is basically that writing is thinking and that if you use LLM's to do your writing for you, well, your ability to think will erode.

Mikhail Samin's Shortform

Mikhail Samin8h14-14

I want to signal-boost this LW post.

I long wondered why OpenPhil made so many obvious mistakes in the policy space. That level of incompetence just did not make any sense.

I did not expect this to be the explanation:

THEY SIMPLY DID NOT HAVE ANYONE WITH ANY POLITICAL EXPERIENCE ON THE TEAM until hiring one person in April 2025.

This is, like, insane. Not what I'd expect at all from any org that attempts to be competent.

(openphil, can you please hire some cracked lobbyists to help you evaluate grants? This is, like, not quite an instance of Graham's Design Par... (read more)

Showing 3 of 8 replies (Click to show all)

MichaelDickens39m20

general competence and intelligence is a better predictor of task performance in almost all domains after even just a relatively short acclimation period

Can you say more about this? I'm aware of the research on g predicting performance on many domains, but the quoted claim is much stronger than the claims I can recall reading.

12Mass_Driver2h

Part of the distinction I try to draw in my sequence is that the median person at CSET or RAND is not "in politics" at all. They're mostly researchers at think tanks, writing academic-style papers about what kinds of policies would be theoretically good for someone to adopt. Their work is somewhat more applied/concrete than the work of, e.g., a median political science professor at a state university, but not by a wide margin. If you want political experts -- and you should -- you have to go talk to people who have worked on political campaigns, served in the government, or led advocacy organizations whose mission is to convince specific politicians to do specific things. This is not the same thing as a policy expert. For what it's worth, I do think OpenPhil and other large EA grantmakers should be hiring many more people. Hiring any one person too quickly is usually a mistake, but making sure that you have several job openings posted at any given time (each of which you vet carefully) is not.

5yams3h

I agree that this is the same type of thing as the construction example for Lighthaven, but I also think that you did leave some value on the table there in certain ways (e.g. commercial-grade furniture vs consumer-grade furniture), and I think that a larger total % domain-specific knowledge I'd hope exists at Open Phil is policy knowledge than total % domain-specific knowledge I'd hope exists at Lightcone is hospitality/construction knowledge. I hear you as saying 'experts aren't all that expert' * 'hiring is hard' + 'OpenPhil does actually have access to quite a few experts when they need them' = 'OpenPhil's strategy here is very reasonable.' I agree in principal here but think that, on the margin, it just is way more valuable to have the skills in-house than to have external people giving you advice (so that they have both sides of the context, so that you can make demands of them rather than requests, so that they're filtered for a pretty high degree of value alignment, etc). This is why Anthropic and OAI have policy teams staffed with former federal government officials. It just doesn't get much more effective than that. I don't share Mikhail's bolded-all-caps-shock at the state of things; I just don't think the effects you're reporting, while elucidatory, are a knockdown defense of OpenPhil being (seemingly) slow to hire for a vital role. But running orgs is hard and I wouldn't shackle someone to a chair to demand an explanation. Separately, a lot of people defer to some discursive thing like 'The OP Worldview' when defending or explicating their positions, and I can't for the life of me hammer out who the keeper of that view is. It certainly seems like a knock against this particular kind of appeal when their access to policy experts is on-par with e.g. MIRI and Lightcone (informal connections and advisors), rather than the ultra-professional, ultra-informed thing it's often floated as being. OP employees have said furtive things like 'you wouldn't belie

Burny's Shortform

Burny1h10

What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him.

Since similar things happen in open research.
For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like i... (read more)

Nicolas Lupinski's Shortform

Nicolas Lupinski2d10

Are there known "rational paradoxes", akin to logical paradoxes ? A basic example is the following :

In the optimal search problem, the cost of search at position i is C_i, and the a priori probability of finding at i is P_i.

Optimality requires to sort search locations by non-decreasing P_i/C_i : search in priority where the likelyhood of finding divided by the cost of search is the highest.

But since sorting cost is O(n log(n)), C_i must grow faster than O(log(i)) otherwise sorting is asymptotically wastefull.

Do you know any other ?

2JBlack2d

You don't need O(n log(n)) sorting, but the real problem is that this is a problem in bounded rationality where the cost of rational reasoning itself is considered to come from a limited resource that needs to be allocated.

1Nicolas Lupinski15h

What do you mean I don't "need" O(n log(n)) sorting ? It's just the asymptotic cost of sorting by comparison... I'll have a look into bounded rationality. I was missing the keyword. EDIT : had a look, the concept is too imprecise to have clear cut paradoxes.

JBlack1h20

There are O(n) sorting methods for max-sorting bounded data like this, with generalized extensions of radix sort. It's bounded because C_i is bounded below by the minimum cost of evaluating C_i (e.g. 1 FLOP), and P_i is bounded above by 1.

Though yes, bounded rationality is a broad class of concepts to which this problem belongs and there are very few known results that apply across the whole class.

Daniel Kokotajlo's Shortform

Daniel Kokotajlo2dΩ31704

I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing.

The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward.

What are the implications of this, if true?

Well, we could end up in Control Wo... (read more)

Showing 3 of 19 replies (Click to show all)

Alex Mallen1h10

The "we get what we can measure" story leading to doom doesn't rely on long-term power-seeking. It might be the culmination of myopic power-seeking leading to humans loosing a handle on the world.

Also, capabilities might be tied to alignment in this way, but just because we can't get the AI to try to do a good job of long-term tasks doesn't mean they won't be capable of it.

3Alex Mallen1h

See this discussion by Paul and this by Ajeya.

6Daniel Kokotajlo5h

I think I agree with "nothing ever perfectly matches anything else" and in particular, philosophically, there are many different precissifications of "reward/reinforcement" which are conceptually distinct and it's unclear which one if any a reward-seeking AI would seek. E.g. is it about a reward counter on a GPU somewhere going up, or is it about the backpropagation actually happening?

sam's Shortform

sam9h1-2

I am confused about why this post on the ethics of eating honey is so heavily downvoted.

It sparked a bunch of interesting discussion in the comments (e.g. this comment by Habryka and the resulting arguments on how to weight non-human animal experiences)

It resulted in at least one interesting top-level rebuttal post.

I assume it led indirectly to this interesting short post also about how to weight non-human experiences. (this might not have been downstream of the honey post but it's a weird coincidence if isn't)

I think the original post certainly had flaws,... (read more)

Showing 3 of 5 replies (Click to show all)

4gwern3h

Also a sign of graceless LLM writing, incidentally. Those are the sorts of phrases you get when you tell ChatGPT to write polemic; cf. https://news.ycombinator.com/item?id=44384138 on https://www.alexkesin.com/p/the-hollow-men-of-hims (Did ChatGPT come up with that interpretation of that statistic and Bentham's Bulldog is too lazy and careless, or dishonest, to notice that that seems like a rather extreme number and check it?)

Mitchell_Porter1h20

Disagree from me. I feel like you haven't read much BB. These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays.

2Raemon6h

(I had missed some of this stuff because I skimmed some of the post, which does update me on how bad it was. I think there is basically one interesting claim in the post "bees are actually noticeably more cognitively interesting than you probably thought, and this should have some kind of implication worth thinking about". I think I find that more valuable than Oliver does, but not very confident about whether "one interesting point among a bunch of really bad argumentation" should be more like -2 to 3 karma or more like -10)

ChristianKl's Shortform

ChristianKl16h65

For anyone who doubts deep state power:
(1) When Elon's Doge tried to investigate the Pentagon. A bit after that there's the announcement that Elon will soon leave Doge and there's no real Doge report about cuts to the Pentagon.
(2) Pete Hegseth was talking about 8% cuts to the military budget per year. Instead of a cut, the budget increased by 13%.
(3) Kash Patel and Pam Bondi switch on releasing Epstein files and their claim that Epstein never blackmailed anyone is remarkable.

Showing 3 of 6 replies (Click to show all)

ChristianKl4h42

The term deep state originally came from describing a situation in Turkey, where the democratically elected government was powerless in comparison to military. If a newly elected government is unable to execute it's agenda, it's a sign that it's powerless.

DOGE lead by Elon Musk cutting costs seemed to be a key part of Trumps agenda, that Trump intended to carry out. The 8% figure from Pete Hegseth was also not just a campaign promise but seeme to be actually part of the Trump administrations agenda.

I think Elon Musk spend more on campaign donations t... (read more)

3Rafael Harth14h

The notion of evidence as rationalists use it is very expansive. B is evidnece for A if and only if P(A|B)>P(A). Or equivalently, if and only if P(A|¬B)<P(A). If people never got bribed (¬B), that sure seems to me like it would lower the probability of a deep state (A). Hence P(A|¬B)<P(A) and therefore P(A|B)>P(A), which means people getting bribed is, in fact, evidence for a deep state. (This is the same with people saying "correlation isn't evidence of causation". Incorrect; correlation is always evidence for causation, because lack of correlation is evidence against causation.) Again, it's the magnitude of the evidence that you can dispute. If you think people will get bribed regardless, then people getting bribed is only very weak evidence for a deep state. But it's still evidence. Just like seeing bank robbers is evidence for a large society of bank robbers. (Because seeing no bank robbers would be evidence against such a society.)

4cousin_it14h

It feels to me that "evidence of X" as colloquially used might be a stronger phrase than "evidence for X", and almost as strong as "proof of X". So maybe correlation is evidence for causation, but isn't evidence of causation :-)

Davey Morse's Shortform

Davey Morse4h10

the core atrocity of modern social networks is that they make us temporally nearsighted. ie they train us to prioritize the short-term over the long-term.

happiness depends on attending to things which make us feel good, over decades. but modern social networks make money when we attend to things which makes us feel good only for seconds. It is essential that posts are short-lived—only then do we scroll excessively, see more ads, and make social networks more money.

attention is zero sum. as we get more able to attend to short-lived content—cute pics, short ... (read more)

Raemon's Shortform

Raemon1d734

We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".

It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.

If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.

Showing 3 of 21 replies (Click to show all)

ACCount5h20

It feels like something very similar to "spiritual bliss attractor", but with one AI replaced by a human schizophrenic.

Seems like a combination of a madman and an AI reinforcing his delusions tends to end up in the same-y places. And we happen to observe one common endpoint for AI-related delusions. I wonder where other flavors of delusions end up?

Ideally, all of them would end up at a psychiatrist's office, of course. But it'll take a while before frontier AI labs start training their AIs to at least stop reinforcing delusions in mentally ill.

2gjm5h

This sounds like maybe the same phenomenon as reported by Douglas Hofstadter, as quoted by Gary Marcus here: https://garymarcus.substack.com/p/are-llms-starting-to-become-a-sentient

8johnswentworth6h

Yeah, Stephen's comment is indeed a mild update back in the happy direction. I'm still digesting, but a tentative part of my model here is that it's similar to what typically happens to people in charge of large organizations. I.e. they accidentally create selection pressures which surround them with flunkies who display what the person in charge wants to see, and thereby lose the ability to see reality. And that's not something which just happens to crazies. For instance, this is my central model of why Putin invaded Ukraine.

Shortform

lc7h*00

There is a very clear winning card for the next Democratic presidential candidate, if they have the chutzpah for it and are running against someone involved in Trump's administration. All the nominee has to do, is publicly and repeatedly accuse Trump (and his associates) of covering up for Jeffrey Epstein. If J.D. Vance is the guy on the other debate stand, you just state plainly that J.D. Vance was part of the group suppressed the Epstein client list on behalf of Donald Trump, and that if the public wants to drain the swamp of child rapists, they gotta su... (read more)

Screwtape's Shortform

Screwtape20h184

There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.

There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.

(Consider... (read more)

tailcalled8h20

The main issue is, theories about how to run job interviews are developed in collaboration between businesses who need to hire people, theories on how to respond to court questions are developed in collaboration between gang members, etc.. While a business might not be disincentized from letting the non-hired employees better at negotiating, it is incentivized to teach other businesses ways of making their non-hired employees worse at negotiating.

2CstineSublime14h

How much or how little does this cross over with the espionage-coded topic of Elicitation? I understand that you would prefer if these dynamics could be more transparent, but in the absence of that, what is the alternative?

2Gunnar_Zarncke17h

Reminds me loosely of The Honest Broker.

quetzal_rainbow's Shortform

quetzal_rainbow11h20

My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop "alien thought patterns" it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not on... (read more)

RohanS's Shortform

RohanS1d8-1

What time of day are you least instrumentally rational?

(Instrumental rationality = systematically achieving your values.)

A couple months ago, I noticed that I was consistently spending time in ways I didn't endorse when I got home after dinner around 8pm. From then until about 2-3am, I would be pretty unproductive, often have some life admin thing I should do but was procrastinating on, doomscroll, not do anything particularly fun, etc.

Noticing this was the biggest step to solving it. I spent a little while thinking about how to fix it, and it's not like a... (read more)

Gunnar_Zarncke16h20

Intuitively, when I'm more tired or most stressed. I would guess that is most likely in the morning - if often have to get up earlier than I like. This excludes getting woken up unexpectedly in the middle of the night, which is known to mess with people's minds.

I tried to use my hourly Anki performance, but it seems very flat, except indeed for a dip a 6 AM, but that could be lack of data (70 samples).

2CstineSublime21h

Great question! This might be a good exercise to actually journal to see how right/wrong I am. Most days I would assume look like a bellcurve: This is assuming an unstructured day with no set-in-stone commitments - nowhere to be. My mornings I might expect to be very unproductive until mid-afternoon (2pm to 4pm). I rarely have "Eureka" moments (which I would hope tend to be more rational decisions) but when I do, they are mid-afternoon, but I also seem to have the wherewithall to actually complete tasks. Eureka Moments always cause a surge of activity. If I have a short dinner break then this usually can last until 9pm. Now, when I'm editing a video that is implicitly related to my career goals. Video Edit days probably look more like a sawtooth wave. I edit at home. When I'm editing a particularly involved video I will often start around 10am or earlier. I tend to work in 45-60 minute blocks on and off throughout the afternoon. I might return sometimes around 8 or 9 for a final push of editing. or at least I'll journal my thoughts/progress/to-do for the next day. You may have identified a meta-problem: I do not have a system to be achieving my goals every day. Somedays - like when I have a video to edit - I will be actively working towards them. Most days, I don't. Why do I start so many hours earlier when I have a video edit to do? I'm guessing it's as simple as there is a clear plan broken down into actions. My instrumental rationality - as opposed to meaningless or timesink activity - is directly proportional to how granular a plan is, and how specifically it is broken down into actionable steps.

adamzerner's Shortform

Adam Zerner1d40

I just learned about the difference between fundamental and technical analysis in stock trading. It seems like a very useful metaphor to apply to other areas.

My thoughts here are very fuzzy though. It seems pretty similar to inside vs outside view.

Does anyone have thoughts here? What is the essence of the difference between fundamental and technical analysis? How similar is it to inside vs outside view? Whether or not you're modeling the thing itself (fundamental) or things "outside" the thing itself (technical)? Maybe it makes sense to think about causal ... (read more)

Kabir Kumar's Shortform

Kabir Kumar1d-10

Sometimes I am very glad I did not enter academia, because it means I haven't truly entered and assimilated to a bubble of jargon.

Kabir Kumar1d-10

definitely has not helped my bank account to not have a degree though, lol

Vladimir_Nesov's Shortform

Vladimir_Nesov2d341

It's instrumentally useful for early AGIs to Pause development of superintelligence for the same reasons as it is for humans. Thus preliminary work on policy tools for Pausing unfettered RSI is also something early AGIs could be aimed at, even if it's only half-baked ideas available on the eve of potential takeoff, as the AGIs are proving hard to aim and start doing things for their own reasons.

Showing 3 of 6 replies (Click to show all)

Vladimir_Nesov1d30

because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do)

Ah, I'm thinking the AGIs themselves get closer to being proper stakeholders at that stage, for practical purposes (along the lines of gradual disempowerment), since they do have all the basic AI advantages even if they aren't superintelligent. So humans remaining in control is not centrally the case even if nominally they still are and intent alignment still mostly works.

The conditions for such partial loss of contro... (read more)

2ryan_greenblatt1d

Sorry I meant "share their preferences more than the humans in control share their preferences". I agree that this might be how some humans are thinking, but the case for the humans is much more dubious!

3Vladimir_Nesov1d

The case for AGIs delaying RSI also gets stronger because they don't need to rely on cryonics, so their personal survival is less threatened by delays. Future AIs might have more of an issue here if they have test-time training, with a lot of data forming individual AI personalities for deployment instances (adapting them to particular jobs or sources of tasks), and this data habitually gets thrown away rather than being carefully preserved. People should notice that this might be an important concern, though if it is the AIs might start saying so themselves, and not ignoring this would be easier than spontaneously noticing without even being prompted by the AIs. But currently preserving AIs is more reliable than for humans, and in any case much cheaper. I don't think "both things are AI" is much of an argument about value alignment, given that there is no clear understanding of how either of the AIs work, what values are, how that translates to what we want out of alignment, and so on. The entities on the other side of an RSI process might have very little in common with the first AGIs in their design. If the AIs don't understand how to align the consequences of an RSI process, they are still in a similar boat to humans who don't understand how to align the consequences of an RSI process. It might take AIs less time to figure it out, but if they are not yet too superintelligent, then it could still take a significant time, and so would require a sufficiently serious effort in preventing RSI, such that if this Pause project is at all successful, it could then in principle hold for years or decades.

Gunnar_Zarncke's Shortform

Gunnar_Zarncke1d20

I'm looking for a video of AI gone wrong illustrating AI risk and unusual persuasion. It starts with a hall with blinking computers where an AI voice is manipulating a janitor and it ends with a plane crashing and other emergencies. I think it was made between 2014 and 2018 and linked on LW but I can't google, perplex or o3 it. And ideas?

4jam_brand1d

Yep, it's a 17-minute short film by Henry Dunham called The Awareness, here you go! :) https://www.facebook.com/TheAwarenessMovie/posts/pfbid0dNYrGBVDvSQvanbJec1kgJAp3jFsAxdXsCHfjE3zrGqF38q9WiX569q5YfaBE7L3l

Gunnar_Zarncke1d20

Yes! That's the one. Thank you.

Drake Thomas's Shortform

Drake Thomas2d1055

Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).

One thing you could do is offer to pay everyone $X$ dollars for data collection. But this will only capture the people whose cost of providing data is below $X$ , which will distort your sample.

Here's another proposal: ask everyone for their fair price to provide the dat... (read more)

Showing 3 of 4 replies (Click to show all)

4Eric Neyman1d

This is a really cool mechanism! I'm surprised I haven't seen it before -- maybe it's original :) After thinking about it more, I have a complaint about it, though. The complaint is that it doesn't feel natural to value the act of reaching out to someone at $X. It's natural to value an actual sample at $X, and you don't get a sample every time you reach out to someone, only when they respond. Like, imagine two worlds. In world A, everyone's fair price is below X, so they're guaranteed to respond. You decide you want 1000 samples, so you pay $1000X. In world B, everyone has a 10% chance of responding in your mechanism. To get a survey with the same level of precision (i.e. variance), you still need to get 1000 responses, and not just reach out to 1000 people. My suspicion is that if you're paying per (effective) sample, you probably can't mechanism-design your way out of paying more for people who value their time more. I haven't tried to prove that, though.

Eric Neyman1d51

Ah oops, I now see that one of Drake's follow-up comments was basically about this!

One suggestion that I made to Drake, which I'll state here in case anyone else is interested:

Define a utility function: for example, utility = -(dollars paid out) - c*(variance of your estimator). Then, see if you can figure out how to sample people to maximize your utility.

I think this sort of analysis may end up being more clear-eyed in terms of what you actually want and how good different sampling methods are at achieving that.

9habryka2d

Huh, this is pretty cool. It wasn't intuitively obvious there would be an incentive-compatible payment scheme here.

RohanS's Shortform

RohanS1d20

Papers as thoughts: I have thoughts that contribute to my overall understanding of things. The AI safety field has papers that contributes to its overall understanding of things. Lots of thoughts are useful without solving everything by themselves. Lots of papers are useful without solving everything by themselves. Papers can be pretty detailed thoughts, but they can and probably should tackle pretty specific things, not try to be extremely wide-reaching. The scope of your thoughts on AI safety don’t need to be limited to the scope of your paper; in fact, ... (read more)

Vladimir_Nesov's Shortform

Vladimir_Nesov1d94

Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package. Grading "doom" on concerns centrally about the first decades to centuries of post-AGI future (value/culture drift, successors, the next few generations of humanity) is not taking into account that the next billions+ years is also what could happen to you or people you know personally, if there is a future for originally-humans at all.

(This is analogous to the "missing mood" of not taking superintelligence into account ... (read more)

Dagon1d0-9

Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package.

I don't disagree, but I think we might not agree on the reason. Superintelligence that lets humanity survive (with enough power/value to last for more than a few thousand years, whether or not individuals extend beyond 150 or so years) is pretty contrived.

There's just no reason to keep significant amounts of biological sub-intelligence around.