LESSWRONG
Petrov Day
LW

918
leogao
7183Ω890324990
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
7leogao's Shortform
Ω
3y
Ω
477
No wikitag contributions to display.
Alignment Stream of Thought
leogao's Shortform
leogao3d*1200

i recently ran into to a vegan advocate tabling in a public space, and spoke briefly to them for the explicit purpose of better understanding what it feels like to be the target of advocacy on something i feel moderately sympathetic towards but not fully bought in on. (i find this kind of thing very valuable for noticing flaws in myself and improving; it's much harder to be perceptive of one's own actions otherwise). the part where i am genuinely quite plausibly persuadable of his position in theory is important; i think if i had talked to e.g flat earthers one might say my reaction is just because i'd already decided not to be persuaded. several interesting things i noticed (none of which should be surprising or novel, especially for someone less autistic than me, but as they say, intellectually knowing things is not the same as actual experience):

  • this guy certainly knew more about e.g health impacts of veganism than i did, and i would not have been able to hold my own in an actual debate.
    • in particular, it's really easy for actually-good-in-practice heuristics to come out as logical fallacies, especially when arguing with someone much more familiar with the object level details than you are.
    • interestingly, since i was pushing the conversation in a pretty meta direction, he actually explicitly said something to the effect that he's had thousands of conversations like this and has a response to basically every argument i could make, do i really think i have something he hasn't heard before, etc. in that moment i realized this was probably true, and that this nonetheless did not necessarily mean that he was correct in his claim. and in addition it certainly didn't make me feel any more emotionally willing to accept his argument
    • in the past, i've personally had the exact experience of arguing for something where i had enough of a dialogue tree that other people couldn't easily find any holes, where the other people were unconvinced, and felt really confused why people weren't seeing the very straightforward argument, and then later it turned out i was actually just wrong and the other people were applying correct heuristics
      • my guess is at the extreme, with sufficient prep and motivation, you can get in this position for arbitrarily wrong beliefs. like probably if i talked to flat earthers for a while i'd get deep enough in their dialogue tree that i'd stop being able to refute them on the object level and would (for the purposes of my own epistemics, not to convince an external audience) have to appeal to cognitive heuristics that are isomorphic to some cognitive fallacies.
    • of course we shouldn't always appeal to the cognitive heuristics. doing so is almost always reasonable and yet you will miss out on the one thing that actually does matter. to do anything interesting you do have to eventually dig into some particular spicy claims and truly resolve things at the object level. but there are so many things in the world and resolving them takes so much time that you need some heuristics to reject a whole bunch of things out of hand and focus your energy on the things that matter.
      • like, i could invest energy until i can actually refute flat earthers completely on the object level, and i'd almost certainly succeed. but this would be a huge waste of time. on the other hand, i could also just never look into anything and say "nothing ever happens". but every important thing to ever happen did, in fact, happen at some point [citation needed].
  • it's really really irritating to be cut off mid sentence. this is hard to admit because i also have an unconscious tendency to do this (currently working on fixing this) and my guess is other people get very annoyed when i do this to them.
    • sometimes i do enjoy being cut off in conversations, but on reflection this is only when i feel like (a) the conversation is cooperative enough that i feel like we're trying to discover the truth together, (b) the other person actually understands what i'm saying before i finish saying it. but since these conditions are much rarer and requires high levels of social awareness to detect, it's a good first order heuristic that interrupting people is bad.
  • i found it completely unhelpful to be told that he was also in my shoes X years ago with similar uncertainties when he was deciding to become vegan; or to be told that he had successfully convinced Y other people to become vegan; or to be subject to what i want to call "therapy speak". i only want to therapyspeak with people i feel relatively close to, and otherwise it comes off as very patronizing.
    • i think there's a closely related thing, which is genuine curiosity about people's views. it uses similar phrases like "what makes you believe that?" but has a very different tone and vibe.
    • his achievements mean a lot more to himself than to me. i don't really care that much what he's accomplished for the purposes of deciding whether his argument is correct. any credibility points conferred are more than cancelled out by it being kind of annoying. even if it is true, there's nothing more annoying than hearing say "i've thought about this more than you / accomplished more than you have because of my phd/experience/etc so you should listen to me" unless you really really really trust this person
      • the calculus changes when there is an audience.
    • therapyspeak is still probably better than nothing, and can be a useful stepping stone for the socially incompetent

one possible take is that i'm just really weird and these modes of interaction work well for normal people more because they're less independently thinking or need to be argued out of having poorly thought out bad takes or something like that, idk. i can't rule this out but my guess is normal people probably are even more this than i am. also, for the purposes of analogy to the AI safety movement, presumably we want to select for people who are independent thinkers who have especially well thought out takes more than just normal people.

also my guess is this particular interaction was probably extremely out of distribution from the perspective of those tabling. my guess is activists generally have a pretty polished pitch for most common situations which includes a bunch of concrete ways of talking they've empirically found to cause people to engage, learned through years of RL against a general audience, but the polishedness of this pitch doesn't generalize out of distribution when poked at in weird ways. my interlocutor even noted at some point that his conversations when tabling generally don't go the way ours went.

Reply422111
Elizabeth's Shortform
leogao7d3-2

yeah, but there would be a lot of worlds where the merger was totally fine and beneficial where it fell through because people had unfounded fears

Reply
Elizabeth's Shortform
leogao8d61

i mean, in general, it's a lot easier to tell plausible-seeming stories of things going really poorly than actually high-likelihood stories of things going poorly. so the anecdata of it actually happening is worth a lot

Reply
Safety researchers should take a public stance
leogao8d168108

I've been repeatedly loud and explicit about this but an happy to state again that racing to build superintelligence before we know how to make it not kill everyone (or cause other catastrophic outcomes) seems really bad and I wish we could coordinate to not do that.

Reply1061
leogao's Shortform
leogao9d22

there's an exogenous factor, which is that the entire country was shifting leftward during the 50s and 60s. it's plausible that the 1964 bill would have passed anyways without the 1957 bill, possibly even earlier

Reply
leogao's Shortform
leogao9d62

what's the current state of analysis on whether the civil rights act of 1957 was actually net positive or negative for civil rights in hindsight? there are two possible stories one can tell, and at the time people were arguing about which is correct:

  1. passing even a useless civil rights bill is a lot better than nothing because it sets a precedent that getting civil rights bills through the Senate is possible / makes the southern coalition no longer look invincible. this serves a useful coordination mechanism because people only want to support things that they think other people will support.
  2. passing a useless civil rights bill is worse than no bill because it creates a false sense of progress and makes it feel like something was done even when nothing was. to the extent that the bill signals to people that getting civil rights bills through the Senate is possible, this is a false impression because the only reason the bill could get through was that it was watered down to uselessness.

this feels directly analogous to the question of whether we should accept very weak AI safety regulations today. 

Reply
leogao's Shortform
leogao11d6416

a thing i've noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we've modeled all the uncertainty (or at least, more than other people who haven't). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i'm doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i've considered some caveat that one might think i forgot to consider

i've heard others make a similar critique about this as a communication style which can mislead non-rats who are not familiar with the style, but i'm making a different claim here that one can trick oneself.

it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is. a concrete example: part of being socially incompetent is not just being bad at taking social actions, but being bad at detecting social feedback on those actions. of course, many people are not even aware of the latter. but many are aware of and acknowledge the latter, and then act as if because they've acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class.

one variant of this deals with hypotheticals - because hypotheticals often can/will never be evaluated, this allows one to get the feeling that one is being epistemically virtuous and making falsifiable predictions, without ever actually getting falsified. for example, a statement "if X had happened, then i bet we would see Y now" has prediction vibes but is not actually a prediction. this is especially pernicious when one fails but says "i failed but i was close, so i should still update positively on what i did." while not always a bad idea, there's a bias-variance tradeoff here, where doing this more often reduces variance but increases bias. i find that cases where i thought i was close but later realized i was actually far off the mark are sufficiently common that this isn't an imaginary concern.

another variant is i think we are much less susceptible to some forms of brainworms/ideology, and are also much better at understanding the mechanisms behind brainworms and identifying them in others, so we over-update on our own insusceptibility to brainworms (despite evidence from the reference class of rationalists that seems to suggest at least as much as genpop if not higher levels of obvious-cult-forming). however, it's just that we are suscpetible to different types of brainworks as normies.

another variant is introspective ability. i think we are probably better in some sense at self-introspection, in the sense that we are better at noticing certain kinds of patterns in our own behavior and developing models for those patterns. but i've also come to believe that this kind of modeling has huge blind spots, and leads many to believe they have a much greater degree of mastery over their own minds than they actually do. however, the feeling that one is aware of the possibility of one having blind spots and being aware of what they often look like in other people can lead to overconfidence about whether one would notice these blindspots in themself.

i feel like the main way i notice these things is by noticing them in other people over long periods of knowing them, and then noticing that my actions are actually deeply analogous to theirs in some way. it also helps to notice non-rats not falling into the same pitfalls sometimes.

i'm not sure how to fix this. merely being aware of it probably is not sufficient. probably the solution is not to stop thinking about one's own limitations, but rather to add some additional cogtech on top. my guess is there is probably valuable memetic technology out there that especially wise people use but which most people, rat or not, don't use. also, difficult-to-fake feedback from reality seems important.

Reply332
Christian homeschoolers in the year 3000
leogao11d31

my guess is that lots of people would change their minds if they really reflected on it with full wisdom and the assistance of an aligned and emotionally intelligent assistant. but if truly deep down some/many people value their beliefs over truth, and would never change their minds even if they reflected deeply on it, who are we to tell them not to do that? the best we can ask for is that they leave us alone to do what we believe is good, and vice versa.

Reply
Vladimir_Nesov's Shortform
leogao11d110

in general publicly known training techniques are behind sota, so this should be taken into account.

Reply
Visual Exploration of Gradient Descent (many images)
leogao11d164

I like the spirit of this work but it would benefit a lot from a more in depth review of the existing literature and methodologies. some examples (non exhaustive)

  • the piecewise approximation thing is a pretty widely accepted opinion in ML
  • visualizing the loss landscape as a plane between three points in model space is pretty common in the field and often the landscape is a lot more nontrivial.
  • a factor of 3 loss difference is huge! if you want to claim that smooth actfn is better beyond what's explained by the loss, you need to compare two models with the same loss but different actfn.
  • the post just hand waves away the difference between SGD and Adam. this is an important difference! Adam tries to take ~constant sized steps along each axis direction.
  • local approximation of the loss landscape as approximately quadratic is pretty widely accepted; generally people look at the eigenvalues of the Hessian to try to understand the local shape of the loss landscape.
  • scaling the gradient 500x is less impactful than it sounds like because the changes to the gradient scale are way less important than you'd expect because they get multiplied out by (1-beta2), this is unlike SGD where gradient scaling is equivalent to LR scaling.
  • learning rate decay is an important part of real training that substantially affects many conclusions
  • to compare models, if possible generally you want to train to the L(D) regime (loss has stopped improving at all), or pick some principled criterion for stopping early compute-optinally (L(C))
Reply
Load More
151My takes on SB-1047
1y
8
106Scaling and evaluating sparse autoencoders
Ω
1y
Ω
6
55Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Ω
2y
Ω
5
106Shapley Value Attribution in Chain of Thought
Ω
2y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
3y
Ω
4
67Clarifying wireheading terminology
Ω
3y
Ω
6
103Scaling Laws for Reward Model Overoptimization
Ω
3y
Ω
13
27How many GPUs does NVIDIA make?
Q
3y
Q
2
81Towards deconfusing wireheading and reward maximization
Ω
3y
Ω
7
27Humans Reflecting on HRH
Ω
3y
Ω
4
Load More