They should not have been counting pull notifications, as they were instructed to not engage with their phones during the experiment except to maybe see what caused a vibration or ding. I don't think students think of pull notifications as real notifications the way we were using the word. They were logging the notifications they could notice while their phone flat was flat on their desk not being touched.
No. Everyone seemed to know what they were, because they all claimed to know someone who uses them. But I don't recall anyone ever admitting to being such a someone. I sense there's a bit of a stigma around them.
It is credible that eliminating all preventable distractions (phones, earbuds, etc.) wouldn't improve learning much. As a teen, I bet you were distracted during class by all sorts of things contained entirely within your head. I know I was!
There's a somewhat stronger case that video games and social media have given students more things to be preoccupied about even if you make these things inaccessible during class. But I also think that just being a hormonal teen is often distracting enough to fill in any attention vacancies faster than the median lesson can.
This is important work.
One suggested tweak: I notice this document starts leaning on the term "loss" in section 4.2 but doesn't tell the reader what that means in this context until 4.3
Something similar happens with the concept of "weights", first used in section 1.3, but only sort-of-explained later, in 4.2.
Speaking of weights, I notice myself genuinely confused in section 5.2, and I'm not sure if it's a problem with the wording or with my current mental model (which is only semi-technical). A quoted forecast reads:
"GPT-2030’s copies can share knowledge due to having identical model weights, allowing for rapid parallel learning: I estimate 2,500 human-equivalent years of learning in 1 day."
Wouldn't the model doing the sharing have, by definition, different weights than the recipient? (How is a model's "knowledge" stored if not in the weights? ) My best guess: shareable "knowledge" would take the form of vectors over the models' common foundational base weights -- which should work as long as there hasn't been too much other divergence since the fork. Is that right? And if so, is there some reason this is a forecast capability and not a current one?
My apologies for challenging the premise, but I don't understand how anyone could hope to be "convinced" that humanity isn't doomed by AGI unless they're in possession of a provably safe design that they have high confidence of being able to implement ahead of any rivals.
Put aside all of the assumptions you think the pessimists are making and simply ask whether humanity knows how to make a mind that will share our values. It it does, please tell us how. If it doesn't, then accept that any AGI we make is, by default, alien -- and building an AGI is like opening a random portal to invite an alien mind to come play with us.
What is your prior for alien intelligence playing nice with humanity -- or for humanity being able to defeat it? I don't think it's wrong to say we're not automatically doomed. But let's suppose we open a portal and it turns out ok: We share tea and cookies with the alien, or we blow its brains out. Whatever. What's to stop humanity from rolling the dice on another random portal? And another? Unless we just happen to stumble on a friendly alien that will also prevent all new portals, we should expect to eventually summon something we can't handle.
Feel free to place wagers on whether humanity can figure out alignment before getting a bad roll. You might decide you like your odds! But don't confuse a wager with a solution.
This is about where I'm at, as well. I've been wrestling with the idea of starting a run myself, but one of my qualifying traits (I teach creative writing) also means I work full time and have little hope of beating out ten people who don't. So much the better, I say, so long as the work gets done well and gets done soon...
...but if, eight months from now, much of the budget is still on the table because of quality issues, it may be because people me sat on our hands.
Hopefully, someone will emerge early to work around this issue, if it turns out to be one. I, for one, would love to be able to turn in a sample and then be offered a credible good-faith assurance that if my run is completed at same quality by such and such date, a payment of x will be earned. But as it stands, the deadline is "whenever that fastest mover(s) get there". Who knows when that will be? Any emergent executive candidate making me a deal might be made a liar by a rival who beats them to the jackpot.
My questions are mostly about the player side, and about how deeply the DM should model the player:
I don't see as much disagreement between us as you might be thinking. Precisely because I agree with your numbered points 1 and 2, I suggested it could be beneficial to compress most of our 12 years of math instruction down to a more intensive 2-3 years. That doesn't mean we couldn't instill useful basic arithmetic in lower grades. If we chose a smaller set of core basics, it could be quite practical to retain them over long summers and breaks -- at least for the students who stay in our system for the long haul.
I'm also glad you brought up the fact that spaced repetition doesn't have to involve software. I should have done more to remind readers of this. I weave the spacing and testing effects into the fabric of my course in many ways that have nothing to do with software.
Carefully engineered homework assignments are great if you have motivated students. Take-home SRS could even work for that. Those students are usually fine, though. It's the apathetic middle I have to fight for, and they won't do homework regardless of how I try to incentivize it.
Moreover, I don't feel good about assigning to students who would hate to do it. School is already prison for those kids. I don't want to send prison home with them. As both a child and a parent, I have been too familiar with the toxic effects homework -- especially math homework -- can have on family relationships. Let kids have a light at the end of the daily tunnel, I say.
Is homework vital to a successful math program? I don't know. But I'm glad I don't teach math.
Did you get IRB approval for these human studies on children?
I'm not sure which is more absurd: the IRB approval process or the very idea of high school. I've often asked people to consider a thought experiment where everyone on Earth suddenly forgets that our educational system as we know it ever existed. Would we really reinvent it just like it is now? Hearing how it worked, would we scream in terror and cancel anyone who had taken part? (Status quo bias much?)
When I was studying stand-up comedy, I actually developed a bit in which I play-acted a researcher proposing high school to an ethics board. It went like this:
RESEARCHER: "I was thinking we could stick 35 sleep-deprived teenagers in a room for an hour and expose them to academic stimuli. After that, we'll do some tests on them.”
BOARD: “I see. Tell me more about your subjects.”
RESEARCHER: "Well, they’re minors, obviously.”
BOARD: “Okay…”
RESEARCHER: “And most of them will be enrolled against their will.”
BOARD: “And how long will you need them?”
RESEARCHER: “6 sessions a day for four years.”
BOARD: "Wait, hold on. Sample size? How many kids are we talking about, here?"
RESEARCHER: "All of them."
BOARD: (mutterings among themselves) “Well, it sounds like everything is in order..."
Are you familiar with Direct Instruction, which is reminiscent of the Mennonite school?
Someone (probably on LW) pointed me to Direct Instruction a few years back, so yes, I'm acquainted with it. Because of the emphasis on staying fully reviewed on all relevant prior knowledge, I saw it as having obvious promise for technical subjects like math, in the hands of the right teacher. I was less convinced it made a good fit elsewhere, perceiving (perhaps unfairly -- I didn't dig too deeply) some big negative trade-offs:
Have you ever tried SRS for muscle memory?
No. I'm not seeing how that would work, or how that would be relevant to what I do, but I'm certainly curious. Do you have examples?
From chatting with those peak students during the experiment, I think their experience is more like being in a cafeteria abuzz with the voices of friends and acquaintances. At some point, you're not even trying to follow every conversation, but are just maintaining some vague awareness of the conversations that are taking place and jumping in when you feel like it. People can and do think about other things in a noisy cafeteria. Some even read books! The brain can filter out a constant buzz. It's just wind blowing through the trees.
The upper middle zone where it's still possible to try to follow everything (and maybe even reply) looked like more of an attention trap, and was where I was more likely to find that handful of students I already knew had a problem. The FOMO is probably more distracting than the notifications themselves.