Trying to get into alignment. Have a low bar for reaching out!
247ca7912b6c1009065bade7c4ffbdb95ff4794b8dadaef41ba21238ef4af94b
I think[1] people[2] probably trust individual tweets way more than they should.
Like, just because someone sounds very official and serious, and it's a piece of information that's inline with your worldviews, doesn't mean it's actually true. Or maybe it is true, but missing important context. Or it's saying A causes B when it's more like A and C and D all cause B together, and actually most of the effect is from C but now you're laser focused on A.
Also you should be wary that the tweets you're seeing are optimized for piquing the interests of people like you, not truth.
I'm definitely not the first person to say this, but feels like it's worth it to say it again.
Sorry, is there a timezone for when the applications would close by, or is it AoE?
Man, politics really is the mind killer
I think knowing the karma and agreement is useful, especially to help me decide how much attention to pay to a piece of content, and I don't think there's that much distortion from knowing what others think. (i.e., overall benefits>costs)
Thanks for putting this up! Just to double check—there aren't any restrictions against doing multiple AISC projects at the same time, right?
Is there no event on Oct 29th?
Wait a minute, "agentic" isn't a real word? It's not on dictionary.com or Merriam-Webster or Oxford English Dictionary.
I agree that if you put more limitations on what heuristics are and how they compose, you end up with a stronger hypothesis. I think it's probably better to leave that out and try do some more empirical work before making a claim there though (I suppose you could say that the hypothesis isn't actually making a lot of concrete predictions yet at this stage).
I don't think (2) necessarily follows, but I do sympathize with your point that the post is perhaps a more specific version of the hypothesis that "we can understand neural network computation by doing mech interp."
Thanks for reading my post! Here's how I think this hypothesis is helpful:
It's possible that we wouldn't be able to understand what's going on even if we had some perfect way to decompose a forward pass into interpretable constituent heuristics. I'm skeptical that this would be the case, mostly because I think (1) we can get a lot of juice out of auto-interp methods and (2) we probably wouldn't need to simultaneously understand that many heuristics at the same time (which is the case for your logic gate example for modern computers). At the minimum, I would argue that the decomposed bag of heuristics is likely to be much more interpretable than the original model itself.
Suppose that the hypothesis is true, then it at least suggests that interpretability researchers should put in more efforts to try find and study individual heuristics/circuits, as opposed to the current more "feature-centric" framework. I don't know how this would manifest itself exactly, but it felt like it's worth saying. I believe that some of the empirical work I cited suggests that we might make more incremental progress if we focused on heuristics more right now.
This chapter on AI follows immediately after the year in review, I went and checked the previous few years' annual reports to see what the comparable chapters were about, they are
2023: China's Efforts To Subvert Norms and Exploit Open Societies
2022: CCP Decision-Making and Xi Jinping's Centralization Of Authority
2021: U.S.-China Global Competition (Section 1: The Chinese Communist Party's Ambitions and Challenges at its Centennial
2020: U.S.-China Global Competition (Section 1: A Global Contest For Power and Influence: China's View of Strategic Competition With the United States)
And this year it's Technology And Consumer Product Opportunities and Risks (Chapter 3: U.S.-China Competition in Emerging Technologies)
Reminds of when Richard Ngo said something along the lines of "We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies"