I'm thinking about secret projects that might be info hazardous to each other but still might need information from each other so the connections are by necessity tenuous and transitory. Is that a topic that has been explored before?

Reply

Will_Pearson's Shortform

Will_Pearson16d10

Has anyone explored using neural clusters found by mechanistic interpretability as part of a goal system?

So that you would look for clusters for certain things e.g. happiness or autonomy and have that neural clusters in the goal system. If the system learned over time it could refine that concept.

This was inspired by how human goals seem to have concepts that change over time in them.

Reply

Will_Pearson's Shortform

Will_Pearson1mo10

I was reading multi agent risks from advanced AI and I was thinking about the section on emergent capabilities. It seems that multi agent capabilities might be greater than the sum of its parts especially where simulation and agency is involved.

Perhaps the agent might be put in a simulation where illegal actions are the most relevant and it is being told that it is being used to red team against the scenario. This could be used to discover plans for bad actions and potentially get round filters for explicit prompts of bad actions.

How to filter these simulation like prompts out so that they should only be used by legitimate entities to actually red team things might be an important question.

Reply

The case for ensuring that powerful AIs are controlled

Will_Pearson2mo10

An interesting thing to do with this kind of approach is apply it to humans (or partially human systems) and see if it makes sense and it makes a sensible society. Try and universalise the concept, because humans out partial human systems can pose great threats, e.g potentially memetic like Marxism . If it doesn't make sense of leads to problems like who controls the controllers (itself a potentially destabilising idea if applied on the large scale) then it might need to be rethought.

Reply

What Is The Alignment Problem?

Will_Pearson2mo10

My take on alignment, aligned differently and willing to compromise

Reply

Will_Pearson's Shortform

Will_Pearson2mo10

Some ideas about AI alignment and governance I've been having

Reply

Will_Pearson's Shortform

Will_Pearson2mo10

True, I was thinking there would be gates to participation in the network that would indicate the skill or knowledge level of the participants without indicating other things about their existence. So if you put gates/puzzles in their way s to participation uch that only people that could generate reward you if they so choose to cooperate could pass it, that would dangle possible reward in front of you.

Reply

Will_Pearson's Shortform

Will_Pearson2mo30

Has anyone been thinking about how to build trust and communicate in a dark forest scenario by making plausibly deniable broadcasts and plausibly deniable reflections of those broadcasts. So you don't actually know who ior how many people you might be talking to

Reply

1

Will_Pearson's Shortform

Will_Pearson3mo10

Sorry formatting got stripped and I didn't notice

Reply