Sodium

Trying to get into alignment. Have a low bar for reaching out!

247ca7912b6c1009065bade7c4ffbdb95ff4794b8dadaef41ba21238ef4af94b

Wiki Contributions

Comments

Sorted by
Sodium50

This chapter on AI follows immediately after the year in review, I went and checked the previous few years' annual reports to see what the comparable chapters were about, they are

2023: China's Efforts To Subvert Norms and Exploit Open Societies

2022: CCP Decision-Making and Xi Jinping's Centralization Of Authority

2021: U.S.-China Global Competition (Section 1: The Chinese Communist Party's Ambitions and Challenges at its Centennial

2020: U.S.-China Global Competition (Section 1: A Global Contest For Power and Influence: China's View of Strategic Competition With the United States)

And this year it's Technology And Consumer Product Opportunities and Risks (Chapter 3: U.S.-China Competition in Emerging Technologies)

Reminds of when Richard Ngo said something along the lines of "We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies"

Sodium30

I think[1] people[2] probably trust individual tweets way more than they should. 

Like, just because someone sounds very official and serious, and it's a piece of information that's inline with your worldviews, doesn't mean it's actually true. Or maybe it is true, but missing important context. Or it's saying A causes B when it's more like A and C and D all cause B together, and actually most of the effect is from C but now you're laser focused on A. 
 

Also you should be wary that the tweets you're seeing are optimized for piquing the interests of people like you, not truth. 

I'm definitely not the first person to say this, but feels like it's worth it to say it again.

  1. ^

    75% Confident maybe?

  2. ^

    including some rationalists on here

Sodium20

Sorry, is there a timezone for when the applications would close by, or is it AoE?

Sodium43

Man, politics really is the mind killer

Sodium30

I think knowing the karma and agreement is useful, especially to help me decide how much attention to pay to a piece of content, and I don't think there's that much distortion from knowing what others think. (i.e., overall benefits>costs)

SodiumΩ010

Thanks for putting this up! Just to double check—there aren't any restrictions against doing multiple AISC projects at the same time, right?

Sodium20

Wait a minute, "agentic" isn't a real word? It's not on dictionary.com or Merriam-Webster or Oxford English Dictionary.

Sodium30

I agree that if you put more limitations on what heuristics are and how they compose, you end up with a stronger hypothesis. I think it's probably better to leave that out and try do some more empirical work before making a claim there though (I suppose you could say that the hypothesis isn't actually making a lot of concrete predictions yet at this stage). 

I don't think (2) necessarily follows, but I do sympathize with your point that the post is perhaps a more specific version of the hypothesis that "we can understand neural network computation by doing mech interp."

Sodium30

Thanks for reading my post! Here's how I think this hypothesis is helpful:

It's possible that we wouldn't be able to understand what's going on even if we had some perfect way to decompose a forward pass into interpretable constituent heuristics. I'm skeptical that this would be the case, mostly because I think (1) we can get a lot of juice out of auto-interp methods and (2) we probably wouldn't need to simultaneously understand that many heuristics at the same time (which is the case for your logic gate example for modern computers). At the minimum, I would argue that the decomposed bag of heuristics is likely to be much more interpretable than the original model itself.

Suppose that the hypothesis is true, then it at least suggests that interpretability researchers should put in more efforts to try find and study individual heuristics/circuits, as opposed to the current more "feature-centric" framework. I don't know how this would manifest itself exactly, but it felt like it's worth saying. I believe that some of the empirical work I cited suggests that we might make more incremental progress if we focused on heuristics more right now.

Load More