So it's been a few months since SB1047. My sense of the main events that have happened since the peak of LW commenter interest (might have made mistakes or missed some items) are:
Note that the lozenges dissolve slowly, so (bad news) you'd have the taste around for a while but (good news) it's really not a very strong peppermint flavor while it's in your mouth, and in my experience it doesn't really have much of the menthol-triggered cooling effect. My guess is that you would still find it unpleasant, but I think there's a decent chance you won't really mind. I don't know of other zinc acetate brands, but I haven't looked carefully; as of 2019 the claim on this podcast was that only Life Extension brand are any good.
On my model of what's going on, you probably want the lozenges to spend a while dissolving, so that you have fairly continuous exposure of throat and nasal tissue to the zinc ions. I find that they taste bad and astringent if I actively suck on them but are pretty unobtrusive if they just gradually dissolve over an hour or two (sounds like you had a similar experience). I sometimes cut the lozenges in half and let each half dissolve so that they fit into my mouth more easily, you might want to give that a try?
I agree, zinc lozenges seem like they're probably really worthwhile (even in the milder-benefit worlds)! My less-ecstatic tone is only relative to the promise of older lesswrong posts that suggested it could basically solve all viral respiratory infections, but maybe I should have made the "but actually though, buy some zinc lozenges" takeaway more explicit.
I liked this post, but I think there's a good chance that the future doesn't end up looking like a central example of either "a single human seizes power" or "a single rogue AI seizes power". Some other possible futures:
The action-relevant question, for deciding whether you want to try to solve alignment, is how the average world with human-controlled AGI compares to the average AGI-controlled world.
To nitpick a little, it's more like "the average world where we just barely didn't solve alignment, versus the average world where we just barely did" (to the extent making things binary in this way is sensible), which I think does affect the calculus a little - marginal AGI-controlled worlds are more likely to have AIs which maintain some human values.
(Though one might ...
My impression is that since zinc inhibits viral replication, it's most useful in the regime where viral populations are still growing and your body hasn't figured out how to beat the virus yet. So getting started ASAP is good, but it's likely helpful for the first 2-3 days of the illness.
An important part of the model here that I don't understand yet is how your body's immune response varies as a function of viral populations - e.g. two models you could have are
The 2019 LW post discusses a podcast which talks a lot about gears-y models and proposed mechanisms; as I understand it, the high level "zinc ions inhibit viral replication" model is fairly well accepted, but some of the details around which brands are best aren't as well-attested elsewhere in the literature. For instance, many of these studies don't use zinc acetate, which this podcast would suggest is best. (To its credit, the 2013 meta-analysis does find that acetate is (nonsignificantly) better than gluconate, though not radically so.)
(TLDR: Recent Cochrane review says zinc lozenges shave 0.5 to 4 days off of cold duration with low confidence, middling results for other endpoints. Some reason to think good lozenges are better than this.)
There's a 2024 Cochrane review on zinc lozenges for colds that's come out since LessWrong posts on the topic from 2019, 2020, and 2021. 34 studies, 17 of which are lozenges, 9/17 are gluconate and I assume most of the rest are acetate but they don't say. Not on sci-hub or Anna's Archive, so I'm just going off the abstract and summary here; would love a P...
I agree this seems pretty good to do, but I think it'll be tough to rule out all possible contaminant theories with this approach:
I've gotten enormous value out of LW and its derived communities during my life, at least some of which is attributable to the LW2.0 revival and its effects on those communities. More recently, since moving to the Bay, I've been very excited by a lot of the in-person events that Lighthaven has helped facilitate. Also, LessWrong is doing so many things right as a website and source-of-content that no one else does (karma-gated RSS feeds! separate upvote and agree-vote! built-in LaTeX support!) and even if I had no connection to the other parts of its missio...
So I would guess it should be possible to post-train an LLM to give answers like "................... Yes" instead of "Because 7! contains both 3 and 5 as factors, which multiply to 15 Yes", and the LLM would still be able to take advantage of CoT
This doesn't necessarily follow - on a standard transformer architecture, this will give you more parallel computation but no more serial computation than you had before. The bit where the LLM does N layers' worth of serial thinking to say "3" and then that "3" token can be fed back into the start of N more layers...
I don't think that's true - in eg the GPT-3 architecture, and in all major open-weights transformer architectures afaik, the attention mechanism is able to feed lots of information from earlier tokens and "thoughts" of the model into later tokens' residual streams in a non-token-based way. It's totally possible for the models to do real introspection on their thoughts (with some caveats about eg computation that occurs in the last few layers), it's just unclear to me whether in practice they perform a lot of it in a way that gets faithfully communicated to the user.
Yeah, I'm thinking about this in terms of introspection on non-token-based "neuralese" thinking behind the outputs; I agree that if you conceptualize the LLM as being the entire process that outputs each user-visible token including potentially a lot of CoT-style reasoning that the model can see but the user can't, and think of "introspection" as "ability to reflect on the non-user-visible process generating user-visible tokens" then models can definitely attain that, but I didn't read the original post as referring to that sort of behavior.
In other words, they can think about the thoughts "behind" the previous words they wrote. If you doubt me on this, try asking one what its words are referring to, with reference to its previous words. Its "attention" modules are actually intentionally designed to know this sort of thing, using using key/query/value lookups that occur "behind the scenes" of the text you actually see on screen.
I don't think that asking an LLM what its words are referring to is a convincing demonstration that there's real introspection going on in there, as opposed to "plausi...
I think my original comment was ambiguous - I also consider myself to have mostly figured it out, in that I thought through these considerations pretty extensively before joining and am in a "monitoring for new considerations or evidence or events that might affect my assessment" state rather than a "just now orienting to the question" state. I'd expect to be most useful to people in shoes similar to my past self (deciding whether to apply or accept an offer) but am pretty happy to talk to anyone, including eg people who are confident I'm wrong and want to convince me otherwise.
Thanks for clearing that up. It sounds like we’re thinking along very similar lines, but that I came to a decision to stop earlier. From a position inside one of major AI labs, you’ll be positioned to more correctly perceive when the risks start outweighing the benefits. I was perceiving events more remotely from over here in Boston, and from inside a company that uses AI as a one of a number of tools, not as the main product.
I’ve been aware of the danger of superintelligence since the turn of the century, and I did my “just now orienting...
See my reply to Ryan - I'm primarily interested in offering advice on something like that question since I think it's where I have unusually helpful thoughts, I don't mean to imply that this is the only question that matters in making these sorts of decisions! Feel free to message me if you have pitches for other projects you think would be better for the world.
Yeah, I agree that you should care about more than just the sign bit. I tend to think the magnitude of effects of such work is large enough that "positive sign" often is enough information to decide that it dominates many alternatives, though certainly not all of them. (I also have some kind of virtue-ethical sensitivity to the zero point of the impacts of my direct work, even if second-order effects like skill building or intra-lab influence might make things look robustly good from a consequentialist POV.)
The offer of the parent comment is more narrowly ...
I work on a capabilities team at Anthropic, and in the course of deciding to take this job I've spent[1] a while thinking about whether that's good for the world and which kinds of observations could update me up or down about it. This is an open offer to chat with anyone else trying to figure out questions of working on capability-advancing work at a frontier lab! I can be reached at "graham's number is big" sans spaces at gmail.
and still spend - I'd like to have Joseph Rotblat's virtue of noticing when one's former reasoning for working on a projec
I’m not “trying to figure out” whether to work on capabilities, having already decided I’ve figured it out and given up such work. Are you interested in talking about this to someone like me? I can’t tell whether you want to restrict discussion to people who are still in the figuring out stage. Not that there’s anything wrong with that, mind you.
Isn't the most relevant question whether it is the best choice for you? (Taking into account your objectives which are (mostly?) altruistic.)
I'd guess having you work on capabilities at Anthropic is net good for the world[1], but probably isn't your best choice long run and plausibly isn't your best choice right now. (I don't have a good understanding of your alternatives.)
My current view is that working on capabilites at Anthropic is a good idea for people who are mostly altruistically motivated if and only if that person is very comparatively advantaged ...
I agree it seems unlikely that we'll see coordination on slowing down before one actor or coalition has a substantial enough lead over other actors that it can enforce such a slowdown unilaterally, but I think it's reasonably likely that such a lead will arise before things get really insane.
A few different stories under which one might go from aligned "genius in a datacenter" level AI at time t to outcomes merely at the level of weirdness in this essay at t + 5-10y:
(I work at Anthropic.) My read of the "touch grass" comment is informed a lot by the very next sentences in the essay:
But more importantly, tame is good from a societal perspective. I think there's only so much change people can handle at once, and the pace I'm describing is probably close to the limits of what society can absorb without extreme turbulence.
which I read as saying something like "It's plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn't want thing...
humanity as a whole probably doesn't want things to get incredibly crazy so fast, and so we're likely to see something tamer
Doesn't this require a pretty strong and unprecedented level of international coordination on stopping an obviously immediately extremely valuable and militarily relevent technology? I think a US backed entente could impose this on the rest of the world, but that would also be an unprecedentedly large effort.
I think this is certainly possible and I hope this level of coordination happens, but I don't exactly think this is likely in...
(I work on capabilities at Anthropic.) Speaking for myself, I think of international race dynamics as a substantial reason that trying for global pause advocacy in 2024 isn't likely to be very useful (and this article updates me a bit towards hope on that front), but I think US/China considerations get less than 10% of the Shapley value in me deciding that working at Anthropic would probably decrease existential risk on net (at least, at the scale of "China totally disregards AI risk" vs "China is kinda moderately into AI risk but somewhat less than the US...
A proper Bayesian currently at less 0.5% credence for a proposition P should assign a less than 1 in 100 chance that their credence in P rises above 50% at any point in the future. This isn't a catch for someone who's well-calibrated.
In the example you give, the extent to which it seems likely that critical typos would happen and trigger this mechanism by accident is exactly the extent to which an observer of a strange headline should discount their trust in it! Evidence for unlikely events cannot be both strong and probable-to-appear, or the events would not be unlikely.
An example of the sort of strengthening I wouldn't be surprised to see is something like "If is not too badly behaved in the following ways, and for all we have [some light-tailedness condition] on the conditional distribution , then catastrophic Goodhart doesn't happen." This seems relaxed enough that you could actually encounter it in practice.
I'm not sure what you mean formally by these assumptions, but I don't think we're making all of them. Certainly we aren't assuming things are normally distributed - the post is in large part about how things change when we stop assuming normality! I also don't think we're making any assumptions with respect to additivity; is more of a notational or definitional choice, though as we've noted in the post it's a framing that one could think doesn't carve reality at the joints. (Perhaps you meant something different by additivity, though - feel...
.00002% — that is, one in five hundred thousand
0.00002 would be one in five hundred thousand, but with the percent sign it's one in fifty million.
Indeed, even on basic Bayesianism, volatility is fine as long as the averages work out
I agree with this as far as the example given, but I want to push back on oscillation (in the sense of regularly going from one estimate to another) being Bayesian. In particular, the odds you should put on assigning 20% in the future, then 30% after that, then 20% again, then 30% again, and so on for ten up-down oscillations, s...
These graphs seem concerning to me, but I'm worried about an information cascade before Eliezer's responded or someone with substantial expertise in macroeconomic policy has weighed in, so I'm planning to refrain from voting on this post until a week from now.
(Posting as a comment in case others feel inclined to adopt a similar policy.)
Edit: I've now upvoted, since no contrary info has come in that I've seen and at least one person with experience in economics has commented supportively.
Late comment, but my reactions reading this:
Now's your chance to figure out what the next few obstacles are without my giving you spoilers first. Feel free to post your list under spoiler tags in the comment section.
[lightly edited for LaTeX and typos, not otherwise changed since seeing the spoilers]
1. You don’t know what you want all that legibly, or what kinds of concrete commitments the AI can make. This seems pretty okay, if you’re unhackable - the AI presents you with some formal specification of desiderata and you understand why they’re correct ones
I think a lot of people in AI safety don't think it has a high probability of working (in the sense that the existence of the field caused an aligned AGI to exist where there otherwise wouldn't have been one) - if it turns out that AI alignment is easy and happens by default if people put even a little bit of thought into it, or it's incredibly difficult and nothing short of a massive civilizational effort could save us, then probably the field will end up being useless. But even a 0.1% chance of counterfactually causing aligned AI would be extremely worth...
Paul Christiano provided a picture of non-Singularity doom in What Failure Looks Like. In general there is a pretty wide range of opinions on questions about this sort of thing - the AI-Foom debate between Eliezer Yudkowsky and Robin Hanson is a famous example, though an old one.
"Takoff speed" is a common term used to refer to questions about the rate of change in AI capabilities at the human and superhuman level of general intelligence - searching Lesswrong or the Alignment Forum for that phrase will turn up a lot of discussion about these questions, thou...
Three thoughts on simulations:
I'm not claiming that you should believe this, I'm merely providing you the true information that I believe it.
Something feels off to me about this notion of "a belief about the object level that other people aren't expected to share" from an Aumann's Agreement Theorem point of view - the beliefs of other rational agents are, in fact, enormous updates about the world! Of course Aumannian conversations happen exceedingly rarely outside of environments with tight verifiable feedback loops about correctness, so in the real world maybe something like these ...
Since this very old post shows up prominently in the search results for New York rationality meetups, it’s worth clarifying that these are still going strong as of 2022! The google group linked in this post is still active and serves as the primary source of meetup announcements; good faith requests to join are generally approved.
Of course the utility lost by missing a flight is vastly greater than that of waiting however long you’d have needed to to make it. But it’s a question of expected utilities - if you’re currently so cautious that you could take 1000 flights and never miss one, you’re arriving early enough to get a 99.9% chance of catching the flight. If showing up 2 minutes later lowers that to 99.8%, you’re not trading 2 minutes per missed flight, you’re trading 2000 minutes per missed flight, which seems worth it.
I think people typically hang out for as long as they want, and the size of the group gradually dwindles. There's no official termination point - I'd be a little surprised if more than half of people were left by 7:30, but I'd also be surprised if at least some meetup attendees weren't still interacting by 10PM or later.
A path ads could take that seems like it would both be more ethical and more profitable, yet I don't see happening: actually get direct consumer feedback!
I like the concept of targeted ads showing me things I enjoy and am interested in, but empirically, they're not very good at it! Maybe it's because I use an adblocker most of the time, but even on my phone, ads are reliably uninteresting to me, and I think the fraction that I click on or update positively towards the company from must be far below 1%.* So why don't advertisers have an option for me to say...
You could pick many plausible metrics (number of matches, number of replies to messages, number of dates, number of longterm relationships) but it seems unlikely that any of them aren't impacted positively for most people in the online dating market by having better photos. Do you have reason to think that two reasonable metrics of success would affect the questions raised in this post differently?
A problem I have that I think is fairly common:
Curious if anyone who once had this problem feels like they've resolved it, and if so what worked!