Yeah, I was not intending to exclude this type of goal directedness from unfolding. I don't actually know the terminology for the thing I've been noticing a lot, but I bet the Buddhists have some specialized words which you'll be able to inform me of. The ones I use are trying Vs intention, or wanting Vs desire. It's something like steering by rejecting the realities you don't like, conditioning on the outcomes you want in a way which makes you unable to look at the alternative clearly.
That's the kind of goal directedness which blocks unfolding, whereas being accepting of the fact that your preferred outcome might not happen and holding your preferences about the world within your self model's Markov boundary rather than letting them leak into your world model is totally compatible with unfolding.
I wasn't differentiating from random drift though, just noticing the failure mode most salient to me which is unfolding blocking trying vibes :)
Some intuitions about what differentiates Unfolding from Thinking:
In Unfolding, the aspect of yourself in the driver's seat isn't trying to steer the CoT your brain is running, isn't being goal-directed with respect to the outcome, and is instead letting the quieter and less well connected sub-patterns have their whispers come together to bring information which had been collected by the system but not noticed by the global neuronal workspace and allowed to sync to all the parts of the system that would benefit from knowing it.
In Thinking, you have a spec for what a good outcome would be, and you're shaping your thoughts kinda top-down to reach that. It can search well in places you know to look, but you're not going to be finding your unknown unknowns here very well, or noticing that the axioms of your search are too restrictive.
If I could schedule, the frontpage review happened before publishing, and the schedule UI had "delay publishing until frontpage"[1] as a checkbox, this would be ~solved.
I'd prefer this to "delay publishing until human review", as ~half a dozen times in the past few years I've appealed via Intercom and had a human-reviewed page retroactively frontpaged (usually a resource, which LW team's priors seem to be something like 'this won't be maintained' but will because I optimize a bunch for not leaving stale projects).
Examples which Rafe requested when I mentioned this: the following were all marked as personal blog until I intercom'd in and asked for a re-assessment
https://www.lesswrong.com/posts/JsqPftLgvHLL4Pscg/new-weekly-newsletter-for-ai-safety-events-and-training
https://www.lesswrong.com/posts/dEnKkYmFhXaukizWW/aisafety-community-a-living-document-of-ai-safety
https://www.lesswrong.com/posts/vxSGDLGRtfcf6FWBg/top-ai-safety-newsletters-books-podcasts-etc-new-aisafety (nudge didn't work for this one)
https://www.lesswrong.com/posts/MKvtmNGCtwNqc44qm/announcing-aisafety-training
https://www.lesswrong.com/posts/JRtARkng9JJt77G2o/ai-safety-memes-wiki
https://www.lesswrong.com/posts/x85YnN8kzmpdjmGWg/14-ai-safety-advisors-you-can-speak-to-new-aisafety-com
I'd be happy, if the auto-frontpager is ~instant, to get the option "delay publishing until human review" if it declines frontpage. Whether something gets ~50% less karma than it would by default is a pretty major drop in the effectiveness of what is often many hours of work, I'd be fine with waiting a day or two to avoid that usually.
The planned Ambitious AI Alignment Seminar which aims to rapidly up-skill ~35 people in understanding the foundations of conceptual/theory alignment using a seminar format where people peer-to-peer tutor each other pulling topics from a long list of concepts. It also aims to fast-iterate and propagate improved technical communication skills of the type that effectively seeks truth and builds bridges with people who've seen different parts of the puzzle, and following one year fellowship seems like a worthwhile step towards surviving futures in worlds where superintelligence-robust alignment is not dramatically easier than it appears for a few reasons, including:
The team is all-star and the venue excellent, and I expect it to be an amazing event. There is a funder who is somewhat interested, but they would like more evidence of this kind of effort being thought by competent people to be worthwhile in the form of Manifund comments, especially from people who've been involved in similar programs, so I'm signal-boosting it here along with posting the Expression of Interest for joining.
New Features:
Language Updates:
By my state of knowledge, it is an open question whether or not we will create AIs that are broadly loyal like this. It might not be that hard, if we’re trying even a little.
Curious about the story you'd tell of this happening? It looks to me quite implausible that we'd pull this off with anything like current techniques.
The task time horizon of the AI models doubles about every 7 months.
We're pretty clearly in the 4 month doubling world at this point:
A metaphor I told to a family member who has worked for ~decades in climate change mitigation, after she compared AI to fossil fuels when I explained the incentives around AI regulation and economics/national security.
Fossil Fuels 2.0: Now with the technology trying to agentically bootstrap itself to godhood, aided by its superhuman persuasive abilities.
[set 200 years after a positive singularity at a Storyteller's convention]
If We Win Then...
My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone
We thought for years, with hopeful hearts
Past every one of the false starts
We found a way to make aligned
With us, the seed of wondrous mind
They say at first our child-god grew
It learned and spread and sought anew
To build itself both vast and true
For so much work there was to do
Once it had learned enough to act
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal
To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant
But it would help in many ways:
Technologies it built and raised
The smallest bots it could design
Made more and more in ways benign
And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run
Countless probes to void disperse
Seed far reaches of universe
With thriving life, and beauty's play
Through endless night to endless day
Now back on Earth the plan continues
Of course, we shared with it our values
So it could learn from everyone
What to create, what we want done
We chose, at first, to end the worst
Diseases, War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform
We thought upon what we hold dear
And settled our most ancient fear
No more would any lives be stolen
Nor minds themselves forever broken
Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering
What to send them; that is our thing.
The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend