Dumbledore likely would have known what it meant, and I think Alastor at the very least would have put together the most crucial parts as well.
The part that was numb with grief and guilt took this opportunity to observe, speaking of obliviousness, that after events at Hogwarts had turned serious, they really really really REALLY should have reconsidered the decision made on First Thursday, at the behest of Professor McGonagall, not to tell Dumbledore about the sense of doom that Harry got around Professor Quirrell. It was true that Harry hadn't been sure who to trust, there was a long stretch where it had seemed plausible that Dumbledore was the bad guy and Professor Quirrell the heroic opposition, but...
Dumbledore would have realised.
Dumbledore would have realised instantly.
For me that fell under ‘My simulation of Voldemort isn’t buying that he can rely on this, not for something so crucial.’
And the answer was: “All right. There is a curse on the Defence Professor position. There has always been a curse on the Defence Professor position. The school has adapted to it. Harry has gotten into just the right kind of shenanigan to cause McGonagall to panic about this, and give Harry the instructions he needs to hear to prevent him from just taking certain matters to McGonagall.”
The question I always had here was “But what was Voldemort’s original plan for dealing with this issue when he decided to teach at Hogwarts?”
Because I don't think he would have wanted to stake all his plans for the stone and Harry on McGonagall coincidentally saying this just in time, and Harry coincidentally being in a state where he obeys her instruction and never rethinks that decision. And Voldemort would have definitely known about the resonance problem before coming to Hogwarts. Even if he thought it would be somehow gone after ten years, he would have realised after the encounter with Harry in Diagon Alley at the very latest that that wasn’t true. So what was his original plan for making sure Harry wouldn’t talk about the resonance to anyone important? Between the vow and the resonance itself, his means of reliably controlling Harry's actions are really very sharply limited.
Every plan I’ve managed to come up with either doesn’t fit with Voldemort’s actual actions in the story, or doesn't seem nearly reliable enough for my mental model of Voldemort to be satisfied with the whole crazy "Let's just walk into Hogwarts, become a teacher, and hang out there for maybe a year" idea.
Eliezer: Right. But there’s more! This model also explains why, when Harry faces the Dementor and is lost in his dark side, and Hermione brings him out of it with a kiss,[18] Harry’s dark side has nothing to say about that kiss, it’s at a loss. Meanwhile, the main part of Harry has a thought process activated.
I picked up on this, though my main guess was that Tom Riddle had just always been aromantic and asexual. I didn’t think any dark rituals were involved.
I do not think that Noosphere’s comment did not contain an argument. The rest of the comment after the passage you cited tries to lay out a model for why continual learning and long-term memory might be the only remaining bottlenecks. Perhaps you think that this argument is very bad, but it is an argument, and I did not think that your reply to it was helpful for the discussion.
My guess is this is obvious, but IMO it seems extremely unlikely to me that bee-experience is remotely as important to care about as cow experience.
I agree with this, but would strike the 'extremely'. I don't actually have gears level models for how some algorithms produce qualia. 'Something something, self modelling systems, strange loops' is not a gears level model. I mostly don't think a million neuron bee brain would be doing qualia, but I wouldn't say I'm extremely confident.
Consequently, I don't think people who say bees are likely to be conscious are so incredibly obviously making a mistake that we have to go looking for some signalling explanation for them producing those words.
But there's no reason to think that the model is actually using a sparse set of components /features on any given forward pass.
I contest this. If a model wants to implement more computations (for example, logic gates) in a layer than that layer has neurons, the known methods for doing this rely on few computations being used (that is, receiving a non-baseline input) on any given forward pass.
I'd have to think about the exact setup here to make sure there's no weird caveats, but my first thought is that for , this ought to be one component per bigram, firing exclusively for that bigram.
An intuition pump: Imagine the case of two scalar features being embedded along vectors . If you consider a series that starts with being orthogonal, then gives them ever higher cosine similarity, I'd expect the network to have ever more trouble learning to read out , , until we hit , at which point the network definitely cannot learn to read the features out at all. I don't know how the learning difficulty behaves over this series exactly, but it sure seems to me like it ought to go up monotonically at least.
Another intuition pump: The higher the cosine similarity between the features, the larger the norm of the rows of will be, with norm infinity in the limit of cosine similarity going to one.
I agree that at cosine similarity , it's very unlikely to be a big deal yet.
Sure, yes, that's right. But I still wouldn't take this to be equivalent to our literally being orthogonal, because the trained network itself might not perfectly learn this transformation.
I can't recall ever taking a test in school or university where time wasn't a pretty scarce resource, unless it was easy enough that I could just get everything right before the deadline without needing to rush.