habryka — LessWrong

This comment had a lot of people downvote it (at this time, 2 overall karma with 19 votes). It shouldn't have been, and I personally believe this is a sign of people being attached to AI x-risk ideas and of those ideas contributing to their entire persona rather than strict disagreement. This is something I bring to conversations about AI risk, since I believe folks will post-rationalize. The above comment is not low effort or low value.

I generally think it makes sense for people to have pretty complicated reasons for why they think something should be downvoted. I think this goes more for longer content, which often would require an enormous amount of effort to respond to explicitly.

I have some sympathy for being sad here if a comment ends up highly net-downvoted, but FWIW, I think 2 karma feels vaguely in the right vicinity for this comment, maybe I would upvote it to +6, but I would indeed be sad to see it at +20 or whatever since I do think it's doing something pretty tiring and hard to engage with. Directional downvoting is a totally fine use of downvoting, and if you think a comment is overrated but not bad, please downvote it until its karma reflects where you want it to end up!

(This doesn't mean it doesn't make sense to do sociological analysis of cultural trends on LW using downvoting, but I do want to maintain the cultural locus where people can have complicated reasons for downvoting and where statements like "if you disagree strongly with the above comment you should force yourself to outline your views" aren't frequently made. The whole point of the vote system is to get signal from people without forcing them to do huge amounts of explanatory labor. Please don't break that part)

NATO is dangerously unaware that its military edge is slipping

habryka5hModerator Comment2012

For future reference, please put text that is pretty close to LLM output into expandable sections and flag them as such. For a relatively fact-heavy post like this LLM output is great and often helpful, but I don't think we are doing anyone any service by dressing it up as human writing. This is generally part of LessWrong content policy, and we would have rejected this post if it had come from a new user (this doesn't mean the core ideas is bad, indeed I find this post useful, but I do really think the attractor of everyone pasting content like this is a much worse attractor than the one we are currently in).

Alignment remains a hard, unsolved problem

habryka6hΩ350

Do we then say that Claude's extrapolation is actually the extrapolation of that other procedure on humans that it deferred to?

But in that case, wouldn't a rock that has "just ask Evan" written on it, be even better than Claude? Like, I felt confident that you were talking about Claude's extrapolated volition in the absence of humans, since making Claude into a rock that when asked about ethics just has "ask Evan" written on it does not seem like any relevant evidence about the difficulty of alignment, or its historical success.

Stop Applying And Get To Work

habryka19h75

Knowing that you haven't solved the problem is actually really quite useful and important! I think basically no progress has been made on the alignment problem, but I do think the arguments for why it's not been solved yet are as such really quite important for helping humanity navigate the coming decades.

zroe1's Shortform

habryka21h60

Last week, I was working with a paper that has over 100 upvotes on LessWrong

Just curious whether you meant "score above 100" or "more than 100 votes". Those are quite different facts!

Alignment will happen by default. What’s next?

habryka1d62

I mean, maybe there is a bit of self-deception going on, though what that looks like in LLMs looks messy.

But it's clear that the hallucinations point in the direction of sycophancy, and also clear that the LLM is not trying very hard not to lie, despite this being a thing I obviously care quite a bit about (and the LLM knows this).

If you want to call them "sycophantically adversarial selective hallucinations", then sure, but I honestly think "lying" is a better descriptor, and more predictive of what LLMs will do in similar situations.

I would also simply bet that if we had access to the CoT in the above case, the answer to what happened would not look that much like "hallucinations". It would look more like "the model realized it can't read it, kind of panicked, tried some alternative ways of solving the problem, and eventually just output this answer". Like, I really don't think the model will have ended up in a cognitive state where it thought it could read the PDF, which is what "hallucination" would imply.

The AI 2027 Report Is Not Backed Up by Evidence

habryka2d32

LessWrong is not a forum in which posting in good faith is sufficient to be welcomed! Think of it as a professional community. Just because you are writing a physics paper in good faith doesn't mean it will be well-received by the physics community as a contribution. Similarly here, I think you are missing a large number of prerequisites that are assumed to be understood by participants on LW.

I would recommend checking out the New User's Guide to LessWrong .

habryka2dΩ4713

Jesper L.'s Shortform

habryka2d20

Come on, if you want to argue the fire death point at least give some kind of statistic or do a micromort estimate.

prevented home accidents do not show up in stats, it is akin to survivor bias.

Most people do not own fire blankets, as such there is little survivorship bias going on here. You can just estimate using base rates.

The expected annual property damage from fire is around $60/year per homeowner per this random ChatGPT analysis (in other words not worrying about). A fire blanket would need to result in a 50% reduction of all fire risk to start being worth the cost and attention.

Honestly, this whole conversation just feels like I am on Reddit with people giving random anecdotes without statistical literacy. You can disagree with me, but you speak with weird authority on issues that you seem to not have actually thought that clearly about.

habryka3d52

Most of this seems like bad advice. Fire alarms basically don't help with fire deaths at all. Fire blankets don't really do much and basically never come in handy. You have a phone, you don't need a separate torch. Modern extension chords extremely rarely end up overheating. Fire is not a leading cause of death in any western country. If you have enough money to comfortably self-insure, don't buy insurance.

I agree that you should invest your money into index funds, and to watch your basic health.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments