MIRI has its monthly newsletters, though I can tell that's not quite what you want. I predict (medium confidence) that we will be upping our level of active coordination with allied orgs and individuals on upcoming projects once we ship some of our current ones. I believe you already have channels to people at MIRI, but feel free to DM me if you want to chat.

Communications in Hard Mode (My new job at MIRI)

tanagrabeast19d51

Me too! I put the AI problem in the broader class of topics where apathy serves as a dual defense mechanism — not just against needing to expend effort and resources, but against emotional discomfort. You can see the same dual barrier when promoting charitable causes aimed at reducing human misery, or when teaching a subject to students who have really struggled with it in the past.

As a teacher, I attacked both of those roots more deliberately as I grew, trying hard to not make my class feel like work most days while building an atmosphere of low-stakes experimentation where failure could be fun rather than painful. (An example of what success looked like: students taking the risk of trying the more advanced writing approaches I modeled instead of endlessly rewriting the same basic meta essay they had learned in middle school.)

One tactic for eroding defensive apathy is therapeutic empathy. You see this both in many good teachers and (I imagine) relationship counselors. It’s much harder in writing, though I suppose I did a little bit of that in this post when I talked about how the reader and I have probably both felt the pull of Apathy with regards to the AI problem. I think empathy works partly because it builds a human connection, and partly because it brings the feared pain to the surface, where we find (with the help of that human connection) that it can be endured, freeing us to work on the problem that accompanies it.

Whether and how to use authentic human connections in our communications is a topic of ongoing research and debate at MIRI. It has obvious problems with regards to scientific respectability, as there’s this sense in intellectual culture that it’s impossible to be impartial about anything one has feelings about.

And sure, the science itself should be dispassionate. The universe doesn’t care how we feel about it, and our emotions will try to keep us from learning things we don’t want to be true.

But when communicating our findings? To the extent that our task is two-pronged: (1) communicating the truth as we understand it and (2) eliciting a global response to address it, I suspect we will need some human warmth and connection in the second prong even as we continue to avoid it in the first. Apathy loves the cold.

Notifications Received in 30 Minutes of Class

tanagrabeast4mo50

From chatting with those peak students during the experiment, I think their experience is more like being in a cafeteria abuzz with the voices of friends and acquaintances. At some point, you're not even trying to follow every conversation, but are just maintaining some vague awareness of the conversations that are taking place and jumping in when you feel like it. People can and do think about other things in a noisy cafeteria. Some even read books! The brain can filter out a constant buzz. It's just wind blowing through the trees.

The upper middle zone where it's still possible to try to follow everything (and maybe even reply) looked like more of an attention trap, and was where I was more likely to find that handful of students I already knew had a problem. The FOMO is probably more distracting than the notifications themselves.

Notifications Received in 30 Minutes of Class

tanagrabeast4mo20

They should not have been counting pull notifications, as they were instructed to not engage with their phones during the experiment except to maybe see what caused a vibration or ding. I don't think students think of pull notifications as real notifications the way we were using the word. They were logging the notifications they could notice while their phone flat was flat on their desk not being touched.

Notifications Received in 30 Minutes of Class

tanagrabeast6mo70

No. Everyone seemed to know what they were, because they all claimed to know someone who uses them. But I don't recall anyone ever admitting to being such a someone. I sense there's a bit of a stigma around them.

Notifications Received in 30 Minutes of Class

tanagrabeast7mo113

It is credible that eliminating all preventable distractions (phones, earbuds, etc.) wouldn't improve learning much. As a teen, I bet you were distracted during class by all sorts of things contained entirely within your head. I know I was!

There's a somewhat stronger case that video games and social media have given students more things to be preoccupied about even if you make these things inaccessible during class. But I also think that just being a hormonal teen is often distracting enough to fill in any attention vacancies faster than the median lesson can.

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?

tanagrabeast10mo90

This is important work.

One suggested tweak: I notice this document starts leaning on the term "loss" in section 4.2 but doesn't tell the reader what that means in this context until 4.3

Something similar happens with the concept of "weights", first used in section 1.3, but only sort-of-explained later, in 4.2.

Speaking of weights, I notice myself genuinely confused in section 5.2, and I'm not sure if it's a problem with the wording or with my current mental model (which is only semi-technical). A quoted forecast reads:

"GPT-2030’s copies can share knowledge due to having identical model weights, allowing for rapid parallel learning: I estimate 2,500 human-equivalent years of learning in 1 day."

Wouldn't the model doing the sharing have, by definition, different weights than the recipient? (How is a model's "knowledge" stored if not in the weights? ) My best guess: shareable "knowledge" would take the form of vectors over the models' common foundational base weights -- which should work as long as there hasn't been too much other divergence since the fork. Is that right? And if so, is there some reason this is a forecast capability and not a current one?

Convince me that humanity *isn’t* doomed by AGI

tanagrabeast3y10

My apologies for challenging the premise, but I don't understand how anyone could hope to be "convinced" that humanity isn't doomed by AGI unless they're in possession of a provably safe design that they have high confidence of being able to implement ahead of any rivals.

Put aside all of the assumptions you think the pessimists are making and simply ask whether humanity knows how to make a mind that will share our values. It it does, please tell us how. If it doesn't, then accept that any AGI we make is, by default, alien -- and building an AGI is like opening a random portal to invite an alien mind to come play with us.

What is your prior for alien intelligence playing nice with humanity -- or for humanity being able to defeat it? I don't think it's wrong to say we're not automatically doomed. But let's suppose we open a portal and it turns out ok: We share tea and cookies with the alien, or we blow its brains out. Whatever. What's to stop humanity from rolling the dice on another random portal? And another? Unless we just happen to stumble on a friendly alien that will also prevent all new portals, we should expect to eventually summon something we can't handle.

Feel free to place wagers on whether humanity can figure out alignment before getting a bad roll. You might decide you like your odds! But don't confuse a wager with a solution.

Visible Thoughts Project and Bounty Announcement

tanagrabeast3y90

This is about where I'm at, as well. I've been wrestling with the idea of starting a run myself, but one of my qualifying traits (I teach creative writing) also means I work full time and have little hope of beating out ten people who don't. So much the better, I say, so long as the work gets done well and gets done soon...

...but if, eight months from now, much of the budget is still on the table because of quality issues, it may be because people me sat on our hands.

Hopefully, someone will emerge early to work around this issue, if it turns out to be one. I, for one, would love to be able to turn in a sample and then be offered a credible good-faith assurance that if my run is completed at same quality by such and such date, a payment of x will be earned. But as it stands, the deadline is "whenever that fastest mover(s) get there". Who knows when that will be? Any emergent executive candidate making me a deal might be made a liar by a rival who beats them to the jackpot.