I think incompatibilities often drive people away (e.g. at LessOnline I have let ppl know they can ask certain ppl not to come to their sessions, as it would make them not want to run the sessions, and this is definitely not due to criticism but to conflict between the two people). That’s one reason why I think this should be available.
This is something I currently want to accommodate but not encourage people to use moderation tools for, but maybe I'm wrong. How can I get a better sense of what's going on with this kind of incompatibility? Why do you th...
It would be easy to give authors a button to let them look at comments that they've muted. (This seems so obvious that I didn't think to mention it, and I'm confused by your inference that authors would have no ability to look at the muted comments at all. At the very least they can simply log out.)
In the discussion under the original post, some people will have read the reply post, and some won't (perhaps including the original post's author, if they banned the commenter in part to avoid having to look at their content), so I have to model this.
Sure, let's give people moderation tools, but why trust authors with unilateral powers that can't be overriden by the community, such as banning and moving comments/commenters to a much less visible section?
My proposal was meant to address the requirement that some authors apparently have to avoid interacting with certain commenters. All proposals dealing with this imply multiple conversations and people having to model different states of knowledge in others, unless those commenters are just silenced altogether, so I'm confused why it's more confusing to have multiple conversations happening in the same place when those conversations are marked clearly.
It seems to me like the main difference is that Habryka just trusts authors to "garden their spaces" more t...
I think we're in a similar place with the philosophical worries: we have both a bunch of specific games that fail with older theories, and a bunch of proposals (say, variants of FDT) without a clear winner.
I think the situation in decision theory is way more confusing than this. See https://www.lesswrong.com/posts/wXbSAKu2AcohaK2Gt/udt-shows-that-decision-theory-is-more-puzzling-than-ever and I would be happy to have a chat about this if that would help convey my view of the current situation.
To reduce clutter you can reuse the green color bars that currently indicate new comments, and make it red for muted comments.
Authors might rarely ban commenters because the threat of banning drives them away already. And if the bans are rare then what's the big deal with requiring moderator approval first?
giving the author social legitimacy to control their own space, combined with checks and balance
I would support letting authors control their space via the mute and flag proposal, adding my weight to its social legitimacy, and I'm guessing others who...
As an aside, I think one UI preference I suspect Habryka has more strongly than Wei Dai does here is that the UI look the same to all users. For similar reasons why WYSIWYG is helpful for editing, when it comes to muting/threading/etc it’s helpful for ppl to all be looking at the same page so they can easily model what others are seeing. Having some ppl see a user’s comments but the author not, or key commenters not, is quite costly for social transparency, and understanding social dynamics.
To reduce clutter you can reuse the green color bars that currently indicate new comments, and make it red for muted comments.
No, the whole point of the green bars is to be a very salient indicator that only shows in the relatively rare circumstance where you need it (which is when you revisit a comment thread you previously read and want to find new comments). Having a permanent red indicator would break in like 5 different ways:
Yeah I think it would help me understand your general perspective better if you were to explain more why you don't like my proposal. What about just writing out the top 3 reasons for now, if you don't want to risk investing a lot of time on something that might not turn out to be productive?
In my mind things aren't neatly categorized into "top N reasons", but here are some quick thoughts:
(I.) I am generally very averse to having any UI element that shows on individual comments. It just clutters things up quickly and requires people to scan each individual comment. I have put an enormous amount of effort into trying to reduce the number of UI elements on comments. I much prefer organizing things into sections which people can parse once, and then assume everything has the same type signature.
(II.) I think a core thing I want UI to do in ...
Comments almost never get downvoted.
Assuming your comment was serious (which on reflection I think it probably was), what about a modification to my proposed scheme, that any muted commenter gets an automatic downvote from the author when they comment? Then it would stay at the bottom unless enough people actively upvoted it? (I personally don't think this is necessary because low quality comments would stay near the bottom even without downvotes just from lack of upvotes, but I want to address this if it's a real blocker for moving away from the ban system.)
BTW my old, now defunct user script LW Power Reader had a feature to adjust the font size of comments based on their karma, so that karma could literally affect visibility despite "the thread structure making strict karma sorting impossible". So you could implement that if you want, but it's not really relevant to the current debate since karma obviously affects visibility virtually even without sorting, in the sense that people can read the number and decide to skip the comment or not.
Said's comment that triggered this debate is 39/34, at the top of the comments section of the post and #6 in Popular Comments for the whole site, but you want to allow the author to ban Said from future commenting, with the rationale "you should model karma as currently approximately irrelevant for managing visibility of comments". I think this is also wrong generally as I've often found karma to be very helpful in exposing high quality comments to me, and keeping lower quality comments less visible toward the bottom, or allowing me to skip them if they occur in the middle of threads.
I almost think the nonsensical nature of this justification is deliberate, but I'm not quite sure. In any case, sigh...
For the record, I had not read that instance of banning, and it is only just at this late point (e.g. after basically the whole thread has wrapped [edit: the whole thread, it turned out, had not wrapped]) did I read that thread and realize that this whole thread was downstream of that. All my comments and points so far were not written with that instance in mind but on general principle.
(And if you're thinking "Surely you would've spoken to Habryka at work about this thread?" my response is "I was not at work! I am currently on vacation." Yes, I have chose...
The point of my proposal is to give authors an out if there are some commenters who they just can't stand to interact with. This is a claimed reason for demanding a unilateral ban, at least for some.
If the author doesn't trust the community to vote bad takes down into less visibility, when they have no direct COI, why should I trust the author to do it unilaterally, when they do? Writing great content doesn't equate to rationality when it comes to handling criticism.
LW has leverage in the form of its audience, which most blogs can't match, but obviously that's not sufficient leverage for some, so I'm willing to accept the status quo, but that doesn't mean I'm going to be happy about it.
Maybe your explanation will change my mind, but your proposal seems clearly worse to me (what if a muted person responds to a unmuted comment? If it gets moved to the bottom, the context is lost? Or they're not allowed to respond to anything on the top section? What epistemic purpose does it serve to allow a person in a potentially very biased moment to unilaterally decide to make a comment or commenter much harder for everyone else to see, as few people would bother to scroll to the bottom?) and also clearly harder to implement.
I feel like you're not prov...
TBC, I think the people demanding unilateral bans will find my proposal unacceptable, due to one of my "uncharitable hypotheses", basically for status/ego/political reasons, or subconsciously wanting to discourage certain critiques or make them harder to find, and the LW team in order to appeal to them will keep the ban system in place. One of my purposes here is just to make this explicit and clear (if it is indeed the case).
My proposal can be viewed as two distinct group conversations happening in the same place. To recap, instead of a ban list, the author would have a mute list, then whenever the muted people comment under their post, that comment would be hidden from the author and marked/flagged on some way for everyone else. Any replies to such muted and flagged comments would themselves be muted and flagged. So conversation 1 is all the unflagged comments, and conversation 2 is all the flagged comments.
If this still seems a bad idea, can you explain why in more detail?
TBC, I think the people demanding unilateral bans will find my proposal unacceptable, due to one of my "uncharitable hypotheses", basically for status/ego/political reasons, or subconsciously wanting to discourage certain critiques or make them harder to find, and the LW team in order to appeal to them will keep the ban system in place. One of my purposes here is just to make this explicit and clear (if it is indeed the case).
Each muted comment/thread is marked/flagged by an icon, color, or text, to indicate to readers that the OP author can't see it, and if you reply to it, your reply will also be hidden from the author.
So like, do you distrust writers using substack? Because substack writers can just ban people from commenting. Or more concretely, do you distrust Scott to garden his own space on ACX?
It's normally out of my mind, but whenever I'm reminded of it, I'm like "damn, I wonder how many mistaken articles I read and didn't realize it because the author banned or discouraged their best (would be) critiques." (Substack has other problems though like lack of karma that makes it hard to find good comments anyway, which I would want to fix first.)
...Giving authors th
Comment threads are conversations! If you have one person in a conversation who can't see other participants, everything gets confusing and weird.
The additional confusion seems pretty minimal, if the muted comments are clearly marked so others are aware that the author can't see them. (Compare to the baseline confusion where I'm already pretty unsure who has read which other comments.)
I just don't get how this is worse than making it so that certain perspectives are completely missing from the comments.
I really don't get the psychology of people who won't use a site without being able to unilaterally ban people (or rather I can only think of uncharitable hypotheses). Why can't they just ignore those they don't want to engage with, maybe with the help of a mute or ignore feature (which can also mark the ignored comments/threads in some way to notify others)?
Gemini Pro's verdict on my feature idea (after asking it to be less fawning): The refined "Mute-and-Flag" system is a functional alternative, as it solves the author's personal need to be shielded from...
Why can't they just ignore those they don't want to engage with, maybe with the help of a mute or ignore feature (which can also mark the ignored comments/threads in some way to notify others)?
I get a sense that you (and Said) are really thinking of this as a 1-1 interaction and not a group interaction, but the group dynamics are where most of my crux is.
I feel like all your proposals are like “have a group convo but with one person blanking another person” or “have a 6-person company where one person just ignores another person” and all of my proposals ar...
It's a localized silencing, which discourages criticism (beyond just the banned critic) and makes remaining criticism harder to find, and yes makes it harder to tell that the author is ignoring critics. If it's not effective at discouraging or hiding criticism, then how can it have any perceived benefits for the author? It's gotta have some kind of substantive effect, right? See also this.
I think giving people the right and responsibility to unilaterally ban commenters on their posts is demanding too much of people's rationality, forcing them to make evaluations when they're among most likely to be biased, and tempting them with the power to silence their harshest or most effective critics. I personally don't trust myself to do this and have basically committed to not ban anyone or even delete any comments that aren't obvious spam, and kind of don't trust others who would trust themselves to do this.
Banning someone does not generally silence their harshest critics. It just asks those critics to make a top-level post, which generally will actually have any shot at improving the record and discourse in reasonable ways compared to nested comment replies.
The thing that banning does is make it so the author doesn't look like he is ignoring critics (which he hasn't by the time he has consciously decided to ban a critic).
My argument is roughly that religions uniquely provide a source of meaning, community, and life guidance not available elsewhere
Why is it good to obtain a source of meaning, if it is not based on sound epistemic foundations? Is obtaining an arbitrary "meaning" better than living without one or going with an "interim meaning of life" like "maximize option value while looking for a philosophically sound source of normativity"?
Thanks for letting me know. Is there anything on my list that you don't think is a good idea or probably won't implement, in which case I might start working on them myself, e.g. as a userscript? Especially #5, which is also useful for other reasons, like archiving and searching.
What do people think about having more AI features on LW? (Any existing plans for this?) For example:
This contradicts my position in Some Thoughts on Metaphilosophy. What about that post do you find unconvincing, or what is your own argument for "philosophy being insoluble"?
I'm not saying that my assessment of it is inarguably correct (indeed, given that mainstream philosophy isn't seriously discredited yet, reasonable people clearly can disagree), but if your conclusions are different, I'd like to know why.
It's mainly because when I'm (seemingly) making philosophical progress myself, e.g., this and this, or when I see other people making apparent philosophical progress, it looks more like "doing what most philosophers do" than "getting feedback from reality".
Perhaps more seriously, the philosophers who got a temporary manpower and influence boost from the invention of math and science should have worked much harder to solve metaphilosophy, while they had the advantage.
It seems to me that values have been a main focus of philosophy for a long time, with moral philosophy (or perhaps meta-ethics if the topic is "what values are") devoted to it and discussed frequently both in academia and out, whereas metaphilosophy has received much less attention. This implies that we know progress on understanding values is probably pretty hard on the current margins, whereas there's a lot more uncertainty about the difficulty of metaphilosophy. Solving the latter would also be of greater utility, since it makes solving all other philosophical problems easier, not just values. I'm curious about the rationale behind your suggestion.
An example of a long-standing philosophical problem that could eventually be solved in this way is the problem of consciousness: if we're eventually able to build artificial brains and "upload" ourselves, by testing different designs we'd be able to figure out which material features give rise to qualia experiences, and by what mechanisms.
I think this will help, but won't solve the whole problem by itself, and we'll still need to decide between competing answers without direct feedback from reality to help us choose. Like today, there are people who den...
I have no idea whether marginal progress on this would be good or bad
Is it because of one of the reasons on this list, or something else?
Math and science as origin sins.
From Some Thoughts on Metaphilosophy:
...Philosophy as meta problem solving Given that philosophy is extremely slow, it makes sense to use it to solve meta problems (i.e., finding faster ways to handle some class of problems) instead of object level problems. This is exactly what happened historically. Instead of using philosophy to solve individual scientific problems (natural philosophy) we use it to solve science as a methodological problem (philosophy of science). Instead of using philosophy to solve individual math proble
I am essentially a preference utilitarian
Want to try answering my questions/problems about preference utilitarianism?
Maybe I would state my first question above a little differently today: Certain decision theories (such as the UDT/FDT/LDT family) already incorporate some preference-utilitarian-like intuitions, by suggesting that taking certain other agents' preferences into account when making certain decisions is a good idea, if e.g. this is logically correlated with them taking your preferences into account. Does preference utilitarianism go beyond t...
Sorry about the delayed reply. I've been thinking about how to respond. One of my worries is that human philosophy is path dependent, or another way of saying this is that we're prone to accepting wrong philosophical ideas/arguments and then it's hard to talk us out of them. The split of western philosophy into analytical and continental traditions seems to be an instance of this, then even within analytical philosophy, academic philosophers would strongly disagree with each other and each be confident in their own positions and rarely get talked out of th...
MacAskill is probably the most prominent, with his "value lock-in" and "long reflection", but in general the notion of philosophical confusion/inadequacy seems a common component of various AI risk cases. I've been particularly impressed by John Wentworth.
That's true, but neither of them have talked about the more general problem "maybe humans/AIs won't be philosophically competent enough, so we need to figure out how to improve human/AI philosophically competence" or at least haven't said this publicly or framed their positions this way.
...The point is
Interesting. Who are they and what approaches are they taking? Have they said anything publicly about working on this, and if not, why?
My impression is that those few who at least understand that they're confused do that
Who else is doing this?
Not exactly an unheard of position.
All of your links are to people proposing better ways of doing philosophy, which contradicts that it's impossible to make progress in philosophy.
policymakers aren't predisposed to taking arguments from those quarters seriously
There are various historical instances of philosophy having large effects on policy (not always in a good way), e.g., abolition of slavery, rise of liberalism ("the Enlightenment"), Communism ("historical materialism").
It seems clear enough to me that pretty much everybody is hopelessly confused about these issues, and sees no promising avenues for quick progress.
If that's the case, why aren't they at least raising the alarm for this additional AI risk?
..."What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in
Given typical pace and trajectory of human philosophical progress, I think we're unlikely to make much headway on the relevant problems (i.e., not enough to have high justified confidence that we've correctly solved them) before we really need the solutions, but various groups will likely convince themselves that they have, and become overconfident in their own proposed solutions. The subject will likely end up polarized and politicized, or perhaps ignored by most as they take the lack of consensus as license to do whatever is most convenient.
Even if the q...
urged that it be retracted
This seems substantially different from "was retracted" in the title. Also, Arxiv apparently hasn't yet followed MIT's request to remove the paper, presumably following it's own policy and waiting for the author to issue his own request.
How do you decide what to set ε to? You mention "we want assumptions about humans that are sensible a priori, verifiable via experiment" but I don't see how ε can be verified via experiment, given that for many questions we'd want the human oracle to answer, there isn't a source of ground truth answers that we can compare the human answers to?
With unbounded Alice and Bob, this results in an equilibrium where Alice can win if and only if there is an argument that is robust to an ε-fraction of errors.
How should I think about, or build up some intuitions...
I think the most dangerous version of 3 is a sort of Chesterton's fence, where people get rid of seemingly unjustified social norms without realizing that they where socially beneficial. (Decline in high g birthrates might be an example.) Though social norms are instrumental values, not beliefs, and when a norm was originally motivated by a mistaken belief, it can still be motivated by recognizing that the norm is useful, which doesn't require holding on to the mistaken belief.
I think that makes sense, but sometimes you can't necessarily motivate a usef...
Some potential risks stemming from trying to increase philosophical competence of humans and AIs, or doing metaphilosophy research. (1 and 2 seem almost too obvious to write down, but I think I should probably write them down anyway.)
Is it something like "during deployment, the simulated human judges might be asked to answer questions far outside the training distribution, and so they might fail to accurately simulate humans (or humans might be worse than on )"?
Yes, but my concern also includes this happening during training of the debaters, when the simulated or actual humans can also go out of distribution, e.g., the actual human is asked a type of question that he has never considered before, and either answers in a confused way, or will have to use philosophical reasoning and a ...
For example, small groups of humans can invent grammatical languages from scratch, and of course historically humans invented science and tech and philosophy and so on from scratch.
I think this could be part of a viable approach, for example if we figure out in detail how humans invented philosophy and use that knowledge to design/train an AI that we can have high justified confidence will be philosophically competent. I'm worried that in actual development of brain-like AGI, we will skip this part (because it's too hard, or nobody pushes for it), and e...
Of course, if the questions on which we need to use AI advice force those distributions to skew too much, and there’s no way for debaters to adapt and bootstrap from on-distribution human data, that will mean our protocol isn’t competitive.
This is my concern, and I'm glad it's at least on your radar. How do you / your team think about competitiveness in general? (I did a simple search and the word doesn't appear in this post or the previous one.) How much competitiveness are you aiming for? Will there be a "competitiveness case" later in this sequence, ...
I'm curious if your team has any thoughts on my post Some Thoughts on Metaphilosophy, which was in large part inspired by the Debate paper, and also seems relevant to "Good human input" here.
Specifically, I'm worried about this kind of system driving the simulated humans out of distribution, either gradually or suddenly, accidentally or intentionally. And distribution shift could cause problems either with the simulation (presumably similar to or based on LLMs instead of low-level neuron-by-neuron simulation), or with the human(s) themselves. In my post, I...
Maybe tweak the prompt with something like, "if your guess is a pseudonym, also give your best guess(es) of the true identity of the author, using the same tips and strategies"?
Can you try this on Satoshi Nakamoto's writings? (Don't necessarily reveal their true identify, if it ends up working, and your attempt/prompt isn't easily reproducible. My guess is that some people have tried already, and failed, either because AI isn't smart enough yet, or they didn't use the right prompts.)
We humans also align with each other via organic alignment.
This kind of "organic alignment" can fail in catastrophic ways, e.g., produce someone like Stalin or Mao. (They're typically explained by "power corrupts" but can also be seen as instances of "deceptive alignment".)
Another potential failure mode is that "organically aligned" AIs start viewing humans as parasites instead of important/useful parts of its "greater whole". This also has plenty of parallels in biological systems and human societies.
Both of these seem like very obvious risks/objection...
No, because power/influence dynamics could be very different in CEV compared to the current world and it seems reasonable to distrust CEV in principle or in practice, and/or CEV is sensitive to initial conditions implying a lot of leverage to influencing opinions before it starts.
Does everyone here remember and/or agree with my point in The Nature of Offense, that offense is about status, which in the current context implies that it's essentially impossible to avoid giving offense while delivering strong criticism (as it almost necessarily implies that the target of criticism deserves lower status for writing something seriously flawed, having false/harmful beliefs, etc.)? @habryka @Zack_M_Davis @Said Achmiz
This discussion has become very long and I've been travelling so I may have missed something, but has anyone managed to ... (read more)