Ebenezer Dukakis - LessWrong

Alexander Gietelink Oldenziel's Shortform

Chinas has alienated virtually all its neighbours

That sounds like an exaggeration? My impression is that China has OK/good relations with countries such as Vietnam, Cambodia, Pakistan, Indonesia, North Korea, factions in Myanmar. And Russia, of course. If you're serious about this claim, I think you should look at a map, make a list of countries which qualify as "neighbors" based purely on geographic distance, then look up relations for each one.

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Ebenezer Dukakis25d20

What I think is more likely than EA pivoting is a handful of people launch a lifeboat and recreate a high integrity version of EA.

Thoughts on how this might be done:

Interview a bunch of people who became disillusioned. Try to identify common complaints.
For each common complaint, research organizational psychology, history of high-performing organizations, etc. and brainstorm institutional solutions to address that complaint. By "institutional solutions", I mean approaches which claim to e.g. fix an underlying bad incentive structure, so it won't require continuous heroic effort to address the complaint.
Combine the most promising solutions into a charter for a new association of some kind. Solicit criticism/red-teaming for the charter.
Don't try to replace EA all at once. Start small by aiming at a particular problem present in EA, e.g. bad funding incentives, criticism (it sucks too hard to both give and receive it), or bad feedback loops in the area of AI safety. Initially focus on solving that particular problem, but also build in the capability to scale up and address additional problems if things are going well.
Don't market this as a "replacement for EA". There's no reason to have an adversarial relationship. When describing the new thing, focus on the specific problem which was selected as the initial focus, plus the distinctive features of the charter and the problems they are supposed to solve.
Think of this as an experiment, where you're aiming to test one or more theses about what charter content will cause organizational outperformance.

I think it would be interesting if someone put together a reading list on high-performing organizations, social movement history, etc. etc. I suspect this is undersupplied on the current margin, compared with observing and theorizing about EA as it exists now. Without any understanding of history, you run the risk of being a "general fighting the last war" -- addressing the problems EA has now, but inadvertently introduce a new set of problems. Seems like the ideal charter would exist in the intersection of "inside view says this will fix EA's current issues" and "outside view says this has worked well historically".

A reading list might be too much work, but there's really no reason not to do an LLM-enabled literature review of some kind, at the very least.

I also think a reading list for leadership could be valuable. One impression of mine is that "EA leaders" aren't reading books about how to lead, research on leadership, or what great leaders did.

AI, centralization, and the One Ring

Ebenezer Dukakis2mo41

The possibility for the society-like effect of multiple power centres creating prosocial incentives on the projects

OpenAI behaves in a generally antisocial way, inconsistent with its charter, yet other power centers haven't reined it in. Even in the EA and rationalist communities, people don't seem to have asked questions like "Is the charter legally enforceable? Should people besides Elon Musk be suing?"

If an idea is failing in practice, it seems a bit pointless to discuss whether it will work in theory.

RobertM's Shortform

Ebenezer Dukakis2mo50

One idea is to use a base LLM with no RLHF, compute the perplexity of the reasoning text, and add it as an additional term in the loss function. That should help with comprehensibility, but it doesn't necessarily help with steganography. To disincentivize steganography, you could add noise to the reasoning in various ways, and remove any incentive for terseness, to ensure the model isn't trying to squeeze more communication into a limited token budget.

A basic idea for detecting steganography is to monitor next-token probabilities for synonym pairs. If they tend to diverge over time, that suggests the model is assigning additional hidden meaning. (This might fail if the synonym embeddings are too close.)

My preferred approach to CoT would be something like:

Give human raters the task of next-token prediction on a large text corpus. Have them write out their internal monologue when trying to predict the next word in a sentence.
Train a model to predict the internal monologue of a human rater, conditional on previous tokens.
Train a second model to predict the next token in the corpus, conditional on previous tokens in the corpus and also the written internal monologue.
Only combine the above two models in production.
Now that you've embedded CoT in the base model, maybe it will be powerful enough that you can discard RHLF, and replace it with some sort of fine-tuning on PhDs roleplaying as a helpful/honest/harmless chatbot.

Basically give the base model a sort of "working memory" that's incentivized for maximal human imitativeness and interpretability. Then you could build an interface where a person can mouse over any word in a sentence and see what the model was 'thinking' when it chose that word. (Realistically you wouldn't do this for every word in a sentence, just the trickier ones.)

dxu's Shortform

Ebenezer Dukakis2mo74

If that's true, perhaps the performance penalty for pinning/freezing weights in the 'internals', prior to the post-training, would be low. That could ease interpretability a lot, if you didn't need to worry so much about those internals which weren't affected by post-training?

Most smart and skilled people are outside of the EA/rationalist community: an analysis

Ebenezer Dukakis4mo55

On LessWrong, there's a comment section where hard questions can be asked and are asked frequently.

In my experience, asking hard questions here is quite socially unrewarding. I could probably think of a dozen or so cases where I think the LW consensus "emperor" has no clothes, that I haven't posted about, just because I expect it to be an exercise in frustration. I think I will probably quit posting here soon.

I don't think AI policy is a good example for discourse on LessWrong. There are strategic reasons to be less transparent about how to affect public policy then for most other topics.

In terms of advocacy methods, sure. In terms of desired policies, I generally disagree.

Everything that's written publically can be easily picked up by journalists wanting to write stories about AI.

If that's what we are worried about, there is plenty of low-hanging fruit in terms of e.g. not tweeting wildly provocative stuff for no reason. (You can ask for examples, but be warned, sharing them might increase the probability that a journalist writes about them!)

Reliable Sources: The Story of David Gerard

Ebenezer Dukakis4mo1314

"The far left is censorious" and "Republicans are censorious" are in no way incompatible claims :-)

Most smart and skilled people are outside of the EA/rationalist community: an analysis

Ebenezer Dukakis4mo48

Great post. Self-selection seems huge for online communities, and I think it's no different on these fora.

Confidence level: General vague impressions and assorted thoughts follow; could very well be wrong on some details.

A disagreement I have with both the rationalist and EA communities is what the process of coming to robust conclusions looks like. In those communities, it seems like the strategy is often to identify a few super-geniuses who go do a super-deep analysis, and come to a conclusion that's assumed to be robust and trustworthy. See the "Groupthink" section on this page for specifics.

From my perspective, I would rather see an ordinary-genius do an ordinary-depth analysis, and then have a bunch of other people ask a bunch of hard questions. If the analysis holds up against all those hard questions, then the conclusion can be taken as robust.

Everyone brings their own incentives, intuitions, and knowledge to a problem. If a single person focuses a lot on a problem, they run into diminishing returns regarding the number of angles of attack. It seems more effective to generate a lot of angles of attack by taking the union of everyone's thoughts.

From my perspective, placing a lot of trust in top EA/LW thought leaders ironically makes them less trustworthy, because people stop asking why the emperor has no clothes.

The problem with saying the emporer has no clothes is: Either you show yourself a fool, or else you're attacking a high-status person. Not a good prospect either way, in social terms.

EA/LW communities are an unusual niche with opaque membership norms, and people may want to retain their "insider" status. So they do extra homework before accusing the emperor of nudity, and might just procrastinate indefinitely.

There can also be a subtle aspect of circular reasoning to thought leadership: "we know this person is great because of their insights", but also "we know this insight is great because of the person who said it". (Certain celebrity users on these fora get 50+ positive karma on basically every top-level post. Hard to believe that the authorship isn't coloring the perception of the content.)

A recent illustration of these principles might be the pivot to AI Pause. IIRC, it took a "super-genius" (Katja Grace) writing a super long post before Pause became popular. If an outsider simply said: "So AI is bad, why not make it illegal?" -- I bet they would've been downvoted. And once that's downvoted, no one feels obligated to reply. (Note, also -- I don't believe there was much reasoning transparency regarding why the pause strategy was considered unpromising at the time. You kinda had to be an insider like Katja to know the reasoning in order to critique it.)

In conclusion, I suspect there are a fair number of mistaken community beliefs which survive because (1) no "super-genius" has yet written a super-long post about them, and (2) poking around by asking hard questions is disincentivized.

Reliable Sources: The Story of David Gerard

Ebenezer Dukakis4mo104

Yeah, I think there are a lot of underexplored ideas along these lines.

It's weird how so much of the internet seems locked into either the reddit model (upvotes/downvotes) or the Twitter model (likes/shares/followers), when the design space is so much larger than that. Someone like Aaron, who played such a big role in shaping the internet, seems more likely to have a gut-level belief that it can be shaped. I expect there are a lot more things like Community Notes that we could discover if we went looking for them.

Reliable Sources: The Story of David Gerard

Ebenezer Dukakis4mo184

I've always wondered what Aaron Swartz would think of the internet now, if he was still alive. He had far-left politics, but also seemed to be a big believer in openness, free speech, crowdsourcing, etc. When he was alive those were very compatible positions, and Aaron was practically the poster child for holding both of them. Nowadays the far left favors speech restrictions and is cynical about the internet.

Would Aaron have abandoned the far left, now that they are censorious? Would he have become censorious himself? Or would he have invented some clever new technology, like RSS or reddit, to try and fix the internet's problems?

Just goes to show what a tragedy death is, I guess.

LESSWRONG
LW

Posts

Wiki Contributions

Comments