Stephen Fowler

Wikitag Contributions

Comments

Sorted by

These recordings I watched were actually from 2022 and weren't the Sante Fe ones. 

A while ago, I watched recordings of the lectures given by by Wolpert and Kardes at the Santa Fe Institute*, and I am extremely excited to see you and Marcus Hutter working in this area. 

Could you speculate on if you see this work having any direct implications for AI Safety?

 

Edit:

I was incorrect. The lectures from Wolpert and Kardes were not the ones given at the Santa Fe Institute.

Signalling that I do not like linkposts to personal blogs.

"cannot imagine a study that would convince me that it "didn't work" for me, in the ways that actually matter. The effects on my mind kick in sharply, scale smoothly with dose, decay right in sync with half-life in the body, and are clearly noticeable not just internally for my mood but externally in my speech patterns, reaction speeds, ability to notice things in my surroundings, short term memory, and facial expressions."

The drug actually working would mean that your life is better after 6 years of taking the drug compared to the counterfactual where you took a placebo.

The observations you describe are explained by you simply having a chemical dependency on a drug that you have been on for 6 years.

"In an argument between a specialist and a generalist, the expert usually wins by simply (1) using unintelligible jargon, and (2) citing their specialist results, which are often completely irrelevant to the discussion. The expert is, therefore, a potent factor to be reckoned with in our society. Since experts both are necessary and also at times do great harm in blocking significant progress, they need to be examined closely. All too often the expert misunderstands the problem at hand, but the generalist cannot carry though their side to completion. The person who thinks they understand the problem and does not is usually more of a curse (blockage) than the person who knows they do not understand the problem.’
—Richard W. Hamming, “The Art of Doing Science and Engineering”

***

(Side note: 

I think there's at least a 10% chance that a randomly selected LessWrong user thinks it was worth their time to read at least some of the chapters in this book. Significantly more users would agree that it was a good use of their time (in expectation) to skim the contents and introduction before deciding if they're in that 10%. 

That is to say, I recommend this book.)

Robin Hanson recently wrote about two dynamics that can emerge among individuals within an organisations when working as a group to reach decisions. These are the "outcome game" and the "consensus game."

In the outcome game, individuals aim to be seen as advocating for decisions that are later proven correct. In contrast, the consensus game focuses on advocating for decisions that are most immediately popular within the organization. When most participants play the consensus game, the quality of decision-making suffers.

The incentive structure within an organization influences which game people play. When feedback on decisions is immediate and substantial, individuals are more likely to engage in the outcome game. Hanson argues that capitalism's key strength is its ability to make outcome games more relevant.

However, if an organization is insulated from the consequences of its decisions or feedback is delayed, playing the consensus game becomes the best strategy for gaining resources and influence. 

This dynamic is particularly relevant in the field of (existential) AI Safety, which needs to develop strategies to mitigate risks before AGI is developed. Currently, we have zero concrete feedback about which strategies can effectively align complex systems of equal or greater intelligence to humans.

As a result, it is unsurprising that most alignment efforts avoid tackling seemingly intractable problems. The incentive structures in the field encourage individuals to play the consensus game instead. 

I'm not saying that this would necessarily be a step in the wrong direction, but I don't think think a discord server is capable of fixing a deeply entrenched cultural problem among safety researchers.
 

If moderating the server takes up a few hours of John's time per week the opportunity cost probably isn't worth it. 

Worth emphasizing that cognitive work is more than just a parallel to physical work, it is literally Work in the physical sense. 

The reduction in entropy required to train a model means that there is a minimum amount of work required to do it. 

I think this is a very important research direction, not merely as an avenue for communicating and understanding AI Safety concerns, but potentially as a framework for developing AI Safety techniques. 

There is some minimum amount of cognitive work required to pose an existential threat, perhaps it is much higher than the amount of cognitive work required to perform economically useful tasks.
 

Can you expect that the applications to interpretability would apply on inputs radically outside of distribution?

My naive intuition is that by taking derivatives are you only describing local behaviour.

(I am "shooting from the hip" epistemically)

A loss of this type of (very weak) interpretability would be quite unfortunate from a practical safety perspective.


This is bad, but perhaps there is a silver lining.

If internal communication within the scaffold appears to be in plain English, it will tempt humans to assume the meaning coincides precisely with the semantic content of the message.

If the chain of thought contains seemingly nonsensical content, it will be impossible to make this assumption.

Load More