All of Muireall's Comments + Replies

I have a moment so I'll summarize some of my thinking here for the sake of discussion. It's a bit more fleshed out at the link. I don't say much about AI capabilities directly since that's better-covered by others.

In the first broad scenario, AI contributes to normal economic growth and social change. Key drivers limit the size and term of bets industry players are willing to make: [1A] the frontier is deeply specialized into a particular paradigm, [1B] AI research and production depend on lumpy capital projects, [1C] firms have difficulty capturing profit... (read more)

3Muireall
I have a moment so I'll summarize some of my thinking here for the sake of discussion. It's a bit more fleshed out at the link. I don't say much about AI capabilities directly since that's better-covered by others. In the first broad scenario, AI contributes to normal economic growth and social change. Key drivers limit the size and term of bets industry players are willing to make: [1A] the frontier is deeply specialized into a particular paradigm, [1B] AI research and production depend on lumpy capital projects, [1C] firms have difficulty capturing profits from training and running large models, and [1D] returns from scaling new methods are uncertain. In the second, AI drives economic growth, but bottlenecks in the rest of the economy limit its transformative potential. Key drivers relate to how much AI can accelerate the non-AI inputs to AI research and production: [2A] limited generality of capabilities, [2B] limited headroom in capabilities, [2C] serial physical bottlenecks, and [2D] difficulty substituting theory for experiment. Indicators (hypothetical observations that would lead us to expect these drivers to have more influence) include: 1. Specialized methods, hardware, and infrastructure dominate those for general-purpose computing in AI. (+1A) 2. Training and deployment use different specialized infrastructure. (+1A, +1B) 3. Generic progress in the semiconductor industry only marginally advances AI hardware. (+1A) 4. Conversely, advances in AI hardware are difficult to repurpose for the rest of the semiconductor industry. (+1A) 5. Specialized hardware production is always scaling to meet demand. (+1A) 6. Research progress is driven chiefly by what we learn from the largest and most expensive projects. (+1B, +1D, +2D) 7. Open-source models and second-tier competitors lag the state of the art by around one large training run. (+1C, +1D) 8. Small models can be cheaply trained once expensive models are proven, achieving results nearly as

Yeah, plus all the other stuff Alexander and Metz wrote about it, I guess.

3Said Achmiz
Could you (or someone else) summarize the other stuff, in the context of my question? I mean, I read it, there’s various things in there, but I’m not sure which of it is supposed to be a definition of “making space for” an idea.

It's just a figure of speech for the sorts of thing Alexander describes in Kolmogorov Complicity. More or less the same idea as "Safe Space" in the NYT piece's title—a venue or network where people can have the conversations they want about those ideas without getting yelled at or worse.

Mathematician Andrey Kolmogorov lived in the Soviet Union at a time when true freedom of thought was impossible. He reacted by saying whatever the Soviets wanted him to say about politics, while honorably pursuing truth in everything else. As a result, he not only made grea

... (read more)
6Said Achmiz
So, basically, allowing the ideas in question to be discussed on one’s blog/forum/whatever, instead of banning people for discussing them?

That section is framed with

Part of the appeal of Slate Star Codex, faithful readers said, was Mr. Siskind’s willingness to step outside acceptable topics. But he wrote in a wordy, often roundabout way that left many wondering what he really believed.

More broadly, part of the piece's thesis is that the SSC community is the epicenter of a creative and influential intellectual movement, some of whose strengths come from a high tolerance for entertaining weird or disreputable ideas.

Metz is trying to convey how Alexander makes space for these ideas without stak... (read more)

8Jiro
It doesn't just pattern match to a clumsy smear. It's also not the only clumsy smear in the article. You're acting as though that's the only questionable thing Metz wrote and that taken in isolation you could read it in some strained way to keep it from being a smear. It was not published in isolation.
3Said Achmiz
What does it mean to “make space for” some idea(s)?
Muireall*1-8

In 2021, I was following these events and already less fond of Scott Alexander than most people here, and I still came away with the impression that Metz's main modes were bumbling and pattern-matching. At least that's the impression I've been carrying around until today. I find his answers here clear, thoughtful, and occasionally cutting, although I get the impression he leaves more forceful versions on the table for the sake of geniality. I'm wondering whether I absorbed some of the community's preconceptions or instinctive judgments about him or journalists in general.

I do get the stubbornness, but I read that mostly as his having been basically proven right (and having put in the work at the time to be so confident).

Answer by Muireall*245

In the 2D case, there's no escaping exponential decay of the autocorrelation function for any observable satisfying certain regularity properties. (I'm not sure if this is known to be true in higher dimensions. If it's not, then there could conceivably be traps with sub-exponential escape times or even attractors, but I'd be surprised if that's relevant here—I think it's just hard to prove.) Sticking to 2D, the question is just how the time constant in that exponent for the observable in question compares to 20 seconds.

The presence of persistent collective... (read more)

Related would be some refactoring of Deception Chess.

When I think about what I'd expect to see in experiments like that, I get curious about a sort of "baseline" set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines' analyses over just observing moves? 

How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points be... (read more)

I sometimes wonder how much we could learn from toy models of superhuman performance, in terms of what to expect from AI progress. I suspect the answer is "not much", but I figured I'd toss some thoughts out here, as much to discharge any need I feel to think about them further as to see if anyone has any good pointers here.

Like—when is performance about making super-smart moves, and when is it about consistently not blundering for as long as possible? My impression is that in Chess, something like "average centipawn loss" (according to some analysis engin... (read more)

1Muireall
Related would be some refactoring of Deception Chess. When I think about what I'd expect to see in experiments like that, I get curious about a sort of "baseline" set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines' analyses over just observing moves?  How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points being enough to distinguish players (human or computer) reasonably well. So we might say there are about 10 "levels" in practice, or that it has a rating depth of 10. If Chess were Best-Of-30 ChessMove as described above, then ChessMove would have a rating depth a bit below 2 (just dividing by √30). In other words, we'd expect it to be very hard to ever distinguish any pair of engines off a single recommended move—and difficult with any number of isolated observations, given our own error-prone human evaluation. If it's closer to Best-Of-30 Don'tBlunder, it's a little more complicated—usually you can't tell the difference because there basically is none, but on rare pivotal moves it will be nearly as easy to tell as when looking at a whole game. The solo version of the experiment looks like this: 1. I find a chess engine with a rating around mine, and use it to analyze positions in games against other engines. Play a bunch of games to get a baseline "hybrid" rating for myself with that advisor. 2. I do the same thing with a series of stronger chess engines, ideally each within a "level" of the last. 3. I do the same thing with access to the output of two engines, and I'm blinded to which is which. (The blinding might require some care around, for example, timing, as well as openings.) In sub-experiment A, I only get top moves and their scores. In sub-experiment B, I can look at lines from the current position up

Many times have I heard people talk about ideas they thought up that are ‘super infohazardous’ and ‘may substantially advance capabilities’ and then later when I have been made privy to the idea, realized that they had, in fact, reinvented an idea that had been publicly available in the ML literature for several years with very mixed evidence for its success – hence why it was not widely used and known to the person coming up with the idea.

I’d be very interested if anyone has specific examples of ideas like this they could share (that are by now widely ... (read more)

7RogerDearnaley
I'm not "on the inside", but my understanding is that some people at Conjecture came up with Chain of Thought prompting and decided that it was infohazardous, I gather fairly shortly before preprints describing it came out in the open AI literature. That idea does work well, but was of course obvious to any schoolteacher.

It sounds like you're saying that you can tell once someone's started transitioning, not that you can recognize trans people who haven't (or who haven't come out, at least not to a circle including you), right? Whether or not you're right, the spirit of this post includes the latter, too.

1Bezzi
Right, I don't claim to be able to spot trans people who didn't start the transition, but at least for those who finished the transition, I assume that a prolonged interaction would at least reveal some clues. Take, I don't know, my conservatory (at least 60-80 people I personally interacted with for years, including some of those gays and lesbians from the previous post). Even if with these people I talk mostly about music, I would be truly shocked to find out that one of them was trans all along. Do you want larger numbers? My father runs a small business with ~1000 customers, and most of them have been the same for years. Even if he doesn't personally know all of them, I am quite sure that he would notice if one of them transitioned. So far, he has not.

This reasoning is basically right, but the answer ends up being 5 for a relatively mundane reason.

If the time-averaged potential energy is k_B T / 2, so is the kinetic energy. Because damping is low, at some point in a cycle, you'll deterministically have the sum of the two in potential energy and nothing in kinetic energy. So you do have some variation getting averaged away.

More generally, while the relaxation timescale is the relevant timescale here, I also wanted to introduce an idea about very fast measurement events like the closing of the electrical

... (read more)
6DaemonicSigil
Muireall*101

Setting aside most problems with the original, I've always found this interferometer example an unsatisfying introduction because it's surprisingly ambiguous exactly what's quantum mechanical here or what's special about quantum mechanics.

You have superposition and interference in classical electromagnetism. That's enough for everything until you get to the two-photon experiment (that is, for everything in "Configurations and Amplitude"). Single photons and photon counters are posited, but these are taken as given where I would sooner take as given the ide... (read more)

I'd be up for a dialogue mostly in the sense of the first bullet about "making sense of technical debates as a non-expert". As far as it's "my domain" it's in the context of making strategic decisions in R&D, but for example I'd also consider things like understanding claims about LK-99 to fall in the domain.

I think on and off about how one might practice (here's one example) and always come away ambivalent. Case studies and retrospectives are valuable, but lately, I tend to lean more pragmatic—once you have a basically reasonable approach, it's often ... (read more)

2Raemon
Yeah I am interested in chatting about this.

The first! These are things I think about sometimes, but I suspect I'd be better at asking interesting questions than at saying interesting things myself.

I'm interested in being a dialogue partner for any of the things I tend to post about, but maybe especially:

  • Explosive economic growth—I'm skeptical of models like Tom Davidson's but curious to probe others' intuitions. For example, what would speeding up hardware research by a factor of 10 or more look like?
  • Molecular nanotechnology, for example clarifying my critique of the use of equilibrium statistical mechanics in Nanosystems
  • Maybe in more of an interviewer/rubber-duck role: Forecasting and its limits, professional ethics in EA, making sense of scientific controversies
  • My own skepticism about AI x-risk as a priority
3Raemon
Doublechecking understanding: in this scenario you’re looking for someone who has opinions about forecasting, and interviewing them? Or did you wanna be interviewed?
Muireall1613

My understanding is that perpetrator sexuality has little to do with the gender of chosen victims in child sexual abuse. If Annie was four years old and Sam thirteen at the time, I don't think attraction to women played much of a role either way.

3tailcalled
Ehh, idk. Obviously pedophiles are much more likely to sexually assault children than teliophiles are, and from what I've heard pedophiles are more likely to have no particular preference (or only weak preferences) for whether their victims are male or female. But pedophilic child molesters tend to have strong preferences for children, which is in tension with Sam Altman being attracted to adult men. Alternatively I've heard that some teliophiles molest children out of opportunism, but that seems somewhat counterintuitive to me (in order to see children as a sexual opportunity, wouldn't they need to be attracted to them?). It's less counterintuitive if we're talking about teens (sexual attractiveness to teliophiles tends to gradually increase due to age, rather than suddenly spiking up at the age of consent), but that doesn't square with Annie being four years old. I'm pretty sure this type of child molester tends to have a correspondence between their preference for adults's sex and their preference for children's sex, but I also think their preference for children's sex is weaker than their preference for adult's sex. These explanations are all making reference to the perpetrator's sexuality, though of course in much more complex and nuanced ways than gay/straight/bi.

Sure. I only meant to use Thomas's frame, where it sounds like Thomas did originally accept Nate's model on some evidence, but now feels it wasn't enough evidence. What was originally persuasive enough to opt in? I haven't followed all Nate's or Eliezer's public writing, so I'd be plenty interested in an answer that draws only from what someone can detect from their public writing. I don't mean to demand evidence from behind the confidentiality screen, even if that's the main kind of evidence that exists.

Separately, I am skeptical and a little confused as to what this could even look like, but that's not what I meant to express in my comment.

Muireall2224

The model was something like: Nate and Eliezer have a mindset that's good for both capabilities and alignment, and so if we talk to other alignment researchers about our work, the mindset will diffuse into the alignment community, and thence to OpenAI, where it would speed up capabilities. I think we didn't have enough evidence to believe this, and should have shared more.

What evidence were you working off of? This is an extraordinary thing to believe.

First I should note that Nate is the one who most believed this; that we not share ideas that come from Nate was a precondition of working with him. [edit: this wasn't demanded by Nate except in a couple of cases, but in practice we preferred to get Nate's input because his models were different from ours.]

With that out of the way, it doesn't seem super implausible to us that the mindset is useful, given that MIRI had previously invented out of the box things like logical induction and logical decision theory, and that many of us feel like we learned a lot... (read more)

Raemon*119

This isn't quite how I'd frame the question.

[edit: My understanding is that] Eliezer and Nate believe this. I think it's quite reasonable for other people to be skeptical of it. 

Nate and Eliezer can choose to only work closely/mentor people who opt into some kind of confidentiality clause about it. People who are skeptical or don't think it's worth the costs can choose not to opt into it.

I have heard a few people talk about MIRI confidentiality norms being harmful to them in various ways, so I do also think it's quite reasonable for people to be more ... (read more)

Measuring noise and measurement noise

You're using an oscilloscope to measure the thermal noise voltage across a resistance . Internally, the oscilloscope has a parallel input resistance  and capacitance , where the voltage on the capacitor is used to deflect electrons in a cathode ray tube to continuously draw a line on the screen proportional to the voltage over time.

The resistor and oscilloscope are at the same temperature. Is it possible to determine  from the amplitude of the fluctuating voltage shown on the oscillos... (read more)

Molecular electromechanical switch

You've attached one end of a conductive molecule to an electrode. If the molecule bends by a certain distance  at the other end, it touches another electrode, closing an electrical circuit. (You also have a third electrode where you can apply a voltage to actuate the switch.)

You're worried about the thermal bending motion of the molecule accidentally closing the circuit, causing an error. You calculate, using the Boltzmann distribution over the elastic potential energy in the molecule, that the probability of a ... (read more)

1DaemonicSigil
EDIT: added spoiler formatting

Whether this imbalance can possibly be cheaply engineered away or not might determine the extent to which the market for AI deployment (which may or may not become vertically disintegrated from AI R&D and training) is dominated by a few small actors, and seems like an important question about hardware R&D. I don't have the expertise to judge to what extent engineering away these memory bottlenecks is feasible and would be interested to hear from people who do have expertise in this domain.

You may know this, but "in-memory computing" is the major se... (read more)

4boubounet
On a different topic but answering to the same quote : advancements in quantization of models to significantly reduce model memory consumption for inference without reducing model performance might also mitigate the imbalance between ALU ops and memory bandwith. This might only shift the problem a few orders of magnitude away, but still, I think it‘s worth mentioning.

Since Raemon's Thinking Physics exercise I've been toying with writing physics puzzles along those lines. (For fun, not because I'm aiming to write better exercise candidates.) If you assume an undergrad-level background and expand to modern physics and engineering there are interesting places you can go. I think a lot about noise and measurement, so that's where my mind has been. Maybe some baseline questions could look like the below? Curious to hear anyone's thoughts.

Pushing a thermal oscillator

You're standing at one end of a grocery aisle. In your cart... (read more)

3DaemonicSigil
EDIT: added spoiler formatting
7Muireall
Measuring noise and measurement noise You're using an oscilloscope to measure the thermal noise voltage across a resistance R. Internally, the oscilloscope has a parallel input resistance Rin and capacitance C, where the voltage on the capacitor is used to deflect electrons in a cathode ray tube to continuously draw a line on the screen proportional to the voltage over time. The resistor and oscilloscope are at the same temperature. Is it possible to determine R from the amplitude of the fluctuating voltage shown on the oscilloscope? 1. Yes, if Rin≪R 2. Yes, if Rin∼R 3. Yes, if Rin≫R 4. No
4Muireall
Molecular electromechanical switch You've attached one end of a conductive molecule to an electrode. If the molecule bends by a certain distance d at the other end, it touches another electrode, closing an electrical circuit. (You also have a third electrode where you can apply a voltage to actuate the switch.) You're worried about the thermal bending motion of the molecule accidentally closing the circuit, causing an error. You calculate, using the Boltzmann distribution over the elastic potential energy in the molecule, that the probability of a thermal deformation of at least d is 10−9 (a single-tailed six-sigma deformation in a normal distribution where expected potential energy is kBT/2), but you don't know how to use this information. You know that the bending motion has a natural frequency of 100 GHz with an energy decay timescale of 0.1 nanosecond, and that it behaves as an ideal harmonic oscillator in a thermal bath. You're considering integrating this switch into a 1 GHz processor. What is the probability p of an error in a 1 nanosecond clock cycle? 1. p<10−9 — the Boltzmann distribution is a long-time limit, so you have sub-Boltzmann probability in finite time. 2. p=10−9 — the probability is determined by the Boltzmann distribution. 3. p≈10−8 — the 0.1 nanosecond damping timescale means, roughly, it gets 10 draws from the Boltzmann distribution. 4. p≈10−7 — the 100 GHz natural frequency means it gets 100 tries to cause an error. 5. p>10−7 — the Boltzmann distribution is over long-time averages, so you expect larger deviations on short timescales that otherwise get averaged away.

Stochastic thermodynamics may be the general and powerful framework you're looking for regarding molecular machines.

I enjoyed it, although I'm already the sort of person who thinks Thinking Physics is fun—both the problem solving and the nitpicking about what constitutes a correct explanation. It seems worth doing at least a handful of problems this way, and more broadly deliberately practicing problem solving and metacognition about problem solving. Thinking Physics could be a good complement to Problem Solving Through Problems or How To Solve It, since in my (limited) experience you get quickly diminishing returns to anything but competition math with collections like that.

Answer by Muireall110

OK, a shot at Challenge I, with Poof and Foop, Steam Locomotive, and Expansion of Nothing. Felt like all three are in the sweet spot. I personally dislike Expansion of Nothing.

Poof and Foop:

The problem statement is a bit leading: there's some kind of inversion symmetry relationship between the two cases, so it should go the opposite direction, right?

Initially, definitely. The puncture means that there's less pressure on the right side—instead of colliding with the can, some particles go inside.

But those particles end up colliding with the interior left sid

... (read more)
3Raemon
3Raemon

For context if anyone needs it, the Physics GRE is (was?) a multiple-choice exam where you get penalized for wrong answers but not for blanks. It works out so that if you eliminate one answer there's no harm in guessing, in expectation. There's also considerable time pressure—something like 90 seconds per question on average.

how much deliberate effort you put into calibrating yourself on "how much effort to put into multiple choice questions"

Enough to get through all questions with some time left over, even if that meant guessing on some I could fully solv... (read more)

I only ever flipped through Thinking Physics for fun, but what I remember is that I tended to miss easier problems more often. If I spent time thinking about one, really making sure I got it right, I'd probably get it. Outside those, there were some that really were elementary, but I'd often find myself thinking I'd looked at the author's answer too soon—a self-serving "well, I would have gotten this, if I were really trying." I might say the problem was that I couldn't tell when I needed to really try.

This does remind me a bit of how I studied for the phy... (read more)

2Raemon
I am interested in  * how much deliberate effort you put into calibrating yourself on "how much effort to put into multiple choice questions" * whether you put any deliberate effort into transferring that into the PhD experience * what did you actually do in your PhD experience? * what do you think would have better prepared you for PhD experience?

From the six-author paper: "In the first region below red-arrow C (near 60 °C),
equivalent to region F in the inset of Fig. 5, the resistivity with noise signals can be regarded as
zero." But by "noise signals" they don't mean measurement noise (and their region C doesn't look measurement-noise limited, unless their measurement apparatus is orders of magnitude less sensitive than it should be) but rather sample physics—later in that paragraph: "The presence of noise in the zero-resistivity region is often attributed to phonon vibrations at higher temperature... (read more)

Muireall*372

I did a condensed matter experiment PhD, but high Tc is not my field and I haven't spent much time thinking about this. [Edit: I didn't see Charlie Steiner's comment until I had written most of this comment and started editing for clarity. I think you can treat this as mostly independent.] Still, some thoughts on Q1, maybe starting with some useful references:

Bednorz and Müller, "Possible High Tc Superconductivity in the Ba - La - Cu - O System" (1986) was the inciting paper for the subsequent discoveries of high-Tc superconductors, notably the roughly sim... (read more)

2Liam Donovan
Don't the authors claim to have measured 0 resistivity (modulo measurement noise)?

I hope you don't mind my piling on. This is one of those things (along with moral foundations theory) where it really frustrates me that people seem to believe it has much better scientific backing than it does.

My concern is that people didn't start believing in the EQ-SQ theory based on statistical correlations found with Simon Baron-Cohen's scales. They presumably started believing in it based on fuzzy intuitions arrived at through social experience.

It's hard to gloss empathizing-systemizing/extreme male brain as a theory that Baron-Cohen himself arrived... (read more)

I'm speaking from memory of reporting here, but my understanding is that there was a specific turning point in 2019/2020 when one of these orgs focus tested a bunch of messages and found that trans youth issues worked well, particularly as a wedge. (That is, the opposition were split on it.) US Americans, when polled, have a bunch of weirdly self-contradictory answers to questions on trans issues but are generally more supportive than not, depending on how they're asked. My guess is they mostly don't think about it too much, since there are plausibly a mil... (read more)

I don't know if "decentralized" is quite right. For example, the Alliance Defending Freedom specifically has been an instrumental driving force behind legislative pushes in many states in the last few years. Legislation, election messaging, and media cycles more generally tend to be driven by a handful of conservative/evangelical nonprofits with long-established national operations, also including, for example, the Independent Women's Forum and the Family Research Council. I would also characterize it as more of a wedge issue than a resonant one, although that's more complicated.

4tailcalled
So maybe an accurate model is something like: For a while, there have been conservative organizations working on legal foundations etc. to challenge the trans movement, mostly working in the background. As the trans movement has grown, so has decentralized populist opposition to it, until recently where the concerns of influencers such as Libs of TikTok have become so big that they have been picked up by a lot of conservative media. And finally, something happened in Elon Musk's family which probably involves Elon Musk wanting to prevent Vivian Jenna Wilson from transitioning and wanting to treat her as male, or Elon Musk being mad at Grimes, and therefore using a big chunk of his wealth for buying a major progressive media institution to push back against transgender ideology.

Poorly constructed public narratives, though, make for bad policy and bad culture.

Do they, though? I'm honestly not too worried about this. That's one reason I mentioned "born this way". Of course, I think even just going by self reports "internal sense of gender" is a reasonable first approximation with wide coverage, I think the current policy and cultural agendas for trans rights are pretty much the right ones, and I think that's true pretty much regardless of "underlying truth of the phenomenon".

"My body, my choice" has already been thoroughly absorbed

... (read more)

It's often hard to get a good handle on a proposition if you don't feel able to talk about it with people who disagree. I've offered in the past to privately go over any potentially dangerous ideas anyone thinks they've found.

There aren't really novel thoughts and arguments to preserve nuance from—most of what the summary misses is that this is a story of the author's personal psychological journey through the bullet points. I understand why it's been downvoted, but I'm glad someone was forthright about how tedious it is to read more tens of thousands of words of variations on this particular theme.

Muireall1211

I'm not sure how much of the narration is about you in the present day, or exactly what you're looking for from your audience, but there's a bit I still want to respond to.

I'm a transhumanist. I believe in morphological freedom. If someone wants to change sex, that's a valid desire that Society should try to accommodate as much as feasible given currently existing technology. In that sense, anyone can choose to become trans.

The problem is that the public narrative of trans rights doesn't seem to be about making a principled case for morphological freedom,

... (read more)

Poorly constructed public narratives, though, make for bad policy and bad culture. Yes, much of it carries the instrumental goal of pragmatic trans acceptance, but it's often presented in such a way so as to not only elide the complexities of that acceptance, but to make any discussion of policy trade-offs or personal disagreements radioactive. More, people tend to be poor at distinguishing between "narrative-simplicity" statements and truths worth orienting one's life around. 

Morphological freedom is a powerful and unifying principle that is easily, ... (read more)

Muireall*134

Thanks for replying. This is a lot clearer to me than prior threads, although it also seems as though you're walking back some of your stronger statements.

I think this is still not quite a correct picture. I agree with this:

For electronic devices at maximum packing density where you naturally represent bits with single electrons, and the de broglie wavelength is then quite relevant as a constraint on maximum packing density due to quantum tunneling etc.

However, at maximum packing density with single-electron switches, the energy requirements per area of in... (read more)

Muireall*222

The "tile"/cellular-automaton model comes from Cavin et al., "Science and Engineering Beyond Moore's Law" (2012) and its references, particularly those by Cavin and Zhirnov, including Shankar et al. (2009) for a "detailed treatment". As @spxtr says in a comment somewhere in the long thread, these papers are fine, but don't mean what Jacob Cannell takes them to mean.

That detailed treatment does not describe energy demands of interconnects (the authors assume "no interconnections between devices" and say they plan to extend the model to include interconnect ... (read more)

Oh, no. I just meant to highlight that it was a physically incorrect picture. Metallic conduction doesn’t remotely resemble the “electronic cellular automata” picture, any version of which would get the right answer only accidentally, I agree. A calculation based on information theory would only care about the length scale of signal attenuation.

Even for the purposes of the cellular model, the mean free path is about as unrelated to the positional extent of an electron wavefunction as is the de Broglie wavelength.

Muireall*2010

The picture from Eli Yablonovitch described here is basically right as far as I can tell, and Jacob Cannell's comment here seems to straightforwardly state why his method gets a different answer [edit: that is, it is unphysical]:

But in that sense I should reassert that my model applies most directly only to any device which conveys bits relayed through electrons exchanging orbitals, as that is the generalized electronic cellular automata model, and wires should not be able to beat that bound. But if there is some way to make the interaction distance much m

... (read more)
Muireall*222

The "tile"/cellular-automaton model comes from Cavin et al., "Science and Engineering Beyond Moore's Law" (2012) and its references, particularly those by Cavin and Zhirnov, including Shankar et al. (2009) for a "detailed treatment". As @spxtr says in a comment somewhere in the long thread, these papers are fine, but don't mean what Jacob Cannell takes them to mean.

That detailed treatment does not describe energy demands of interconnects (the authors assume "no interconnections between devices" and say they plan to extend the model to include interconnect ... (read more)

4Steven Byrnes
I understand the second part of this comment to be saying that Jacob & I can reconcile based on the fact that the electron mean free path in metal wires is actually much larger than 1 nm. If that’s what you’re saying, then I disagree. If the lowest possible interconnect loss is a small multiple of kT/(electron mean free path in the wire), then I claim it’s a coincidence. (I don’t think that premise is true anyway; I think they are off by like 4 OOM or something. I think there is like 6 OOM room for improvement in interconnect loss compared to Jacob’s model, so replacing 1 nm with copper mean free path = 40 nm in Jacob’s model is insufficient to get reconciliation.) I think that, if there were two metal wires A & B, and wire A had 10× higher density of mobile electrons than B, each with 10× lower effective mass than B, but the electrons in A have 100× lower mean free path than B, then the resistivities of A & B would be the same, and in fact we would not be able to tell them apart at all, and in particular, their energy dissipation upon transmitting information would be the same. One point of evidence, I claim, is that if I give you a metal wire, and don’t tell you what it’s made of, you will not be able to use normal electrical equipment to measure the electron mean free path for that wire. Whereas if the electron mean free path was intimately connected to electronic noise or binary data transmission or whatever, one might expect that such a measurement would be straightforward.

Yeah, we'll see [how transient the higher rates are]. It looks like NYC also saw a spike 2020-2022 (though I think rates per passenger mile are several times smaller) and this year isn't looking much better (going by https://www.nyc.gov/site/nypd/stats/reports-analysis/transit-bus.page and the NTD pages for the MTA).

For what it's worth, the 2022 CTA homicides were a huge outlier. The years 2001-2019 had 0-2 homicides each (filtering "CTA" in "Location"), then 4 in 2020 and 4 in 2021. Meanwhile, leading up to the pandemic, the National Transit Database says Passenger Miles Traveled was approaching 2 billion (https://www.transit.dot.gov/ntd/transit-agency-profiles/chicago-transit-authority), was down to 800 million for 2020 and 2021, and presumably came back up a lot in 2022 (no profile available yet).

I agree it's not infinitely safe, but I do suspect transit comes out safer by most analyses.

9jefftk
Good point! I see: 2001 0 2002 0 2003 1 2004 2 2005 0 2006 0 2007 1 2008 2 2009 1 2010 0 2011 2 2012 0 2013 1 2014 0 2015 0 2016 1 2017 2 2018 2 2019 2 2020 4 2021 4 2022 9 2023 2 (partial; annualizes to ~5) Looks like some of 2022 homicides being high is something we should expect to continue (higher rates over several years) and some is it being unusual? For the denominator in my division I compared 2022 trips (which did come back up a bunch and is final) with the 2019 numbers for average trip length (2.5mi). Looking at the data you found for 2021 (which I didn't find; thanks for digging it up!) average trip length is 4mi (798,583,310 /195,980,563). I'd expect 2022 average trip length to be somewhere in between?

Moreover, this is an estimate of effective FLOP, meaning that Cotra takes into account the possibility that software efficiency progress can reduce the physical computational cost of training a TAI system in the future. It was also in units of 2020 FLOP, and we're already in 2023, so just on that basis alone, these numbers should get adjusted downwards now.

Isn't it a noted weakness of Cotra's approach that most of the anchors don't actually depend on 2020 architecture or algorithmic performance in any concrete way? As in, if the same method were applied to... (read more)

Thanks, I see. I agree that a lot of confusion could be avoided with clearer language, but I think at least that they're not making as simple an error as you describe in the root comment. Ted does say in the EA Forum thread that they don't believe brains operate at the Landauer limit, but I'll let him chime in here if he likes.

I think the "effective FLOP" concept is very muddy, but I'm even less sure what it would mean to alternatively describe what the brain is doing in "absolute" FLOPs. Meanwhile, the model they're using gives a relatively well-defined e... (read more)

This claim is different from the claim that the brain is doing 1e20 FLOP/s of useful computation, which is the claim that the authors actually make.

Is it? I suppose they don't say so explicitly, but it sounds like they're using "2020-equivalent" FLOPs (or whatever it is Cotra and Carlsmith use), which has room for "algorithmic progress" baked in.

Perhaps you think the brain has massive architectural or algorithmic advantages over contemporary neural networks, but if you do, that is a position that has to be defended on very different grounds than "it would

... (read more)
3Ege Erdil
I think you're just reading the essay wrong. In the "executive summary" section, they explicitly state that and I don't know how you read those claims and arrived at your interpretation, and indeed I don't know how the evidence they provide could support the interpretation you're talking about. It would also be a strange omission to not mention the "effective" part of "effective FLOP" explicitly if that's actually what you're talking about.

If I understand correctly, the claim isn't necessarily that the brain is "doing" that many FLOP/s, but that using floating point operations on GPUs to do the amount of computation that the brain does (to achieve the same results) is very inefficient. The authors cite Single cortical neurons as deep artificial neural networks (Beniaguev et al. 2021), writing, "A recent attempt by Beniaguev et al to estimate the computational complexity of a biological neuron used neural networks to predict in-vitro data on the signal activity of a pyramidal neuron (the most... (read more)

4Ege Erdil
Recapitulating the response of Steven Byrnes to this argument: it may be very expensive computationally to simulate a computer in a faithful way, but that doesn't mean it's expensive to do the same computation that the computer in question is doing. Paraphrasing a nice quote from Richard Borcherds, it may be that teapots are very hard to simulate on a classical computer, but that doesn't mean that they are useful computational devices. If we tried to simulate a GPU doing a simple matrix multiplication at high physical fidelity, we would have to take so many factors into account that the cost of our simulation would far exceed the cost of running the GPU itself. Similarly, if we tried to program a physically realistic simulation of the human brain, I have no doubt that the computational cost of doing so would be enormous. However, this is not what we're interested in doing. We're interested in creating a computer that's doing the same kind of computation as the brain, and the amount of useful computation that the brain could be doing per second is much less than 1e25 or even 1e20 FLOP/s. If your point is that 1e25 FLOP/s is an upper bound on how much computation the brain is doing, I agree, but there's no reason to think it's a tight upper bound. This claim is different from the claim that the brain is doing 1e20 FLOP/s of useful computation, which is the claim that the authors actually make. If you have an object that implements some efficient algorithm that you don't understand, the object can be doing little useful computation even though you would need much greater amounts of computation to match its performance with a worse algorithm. The estimates coming from the brain are important because they give us a sense of how much software efficiency progress ought to be possible here. My argument from the Landauer limit is about the number of bit erasures and doesn't depend on the software being implemented by the brain vs. a GPU. If the brain is doing something t

Just to follow up, I spell out an argument for a lower bound on dissipation that's 2-3 OOM higher in Appendix C here.

What I mean is that "the bit is in the radiator" is another state of the system where a two-level subsystem corresponding to the bit is coupled to a high-temperature bath. There's some transition rate between it and "the bit is in the CPU" determined by the radiator temperature and energy barrier between states. In particular, you need the same kind of energy "wall" as between computational states, except that it needs to be made large compared to the bath temperature to avoid randomly flipping your computational bit and requiring your active cooling to re... (read more)

1DaemonicSigil
Okay, thanks for clarifying. Treating "the bit is in the radiator" as a state of the system still seems a little like a strange way of phrasing it under my model. I would say something like "we have a particle in the CPU and a particle in the Radiator, each of which has 2 states so it can represent a bit. So the combined system has the states being: 00, 01, 10, 11. Moving the bit between the CPU and the Radiator looks like performing the following reversible mapping: 00 -> 00, 01 -> 10, 10 -> 01, 11 -> 11. (i.e. the bits are just swapped). The CPU and Radiator need to be coupled to achieve this mapping. Now to actually have moved a bit from the CPU to the radiator, rather than just have swapped two bits, something needs to be true about the probability distribution over states. In particular, we need the bit in the radiator to be 0 with near certainty. (i.e. P(01) and P(11) are approximately 0) This is the entire reason why the Radiator needs to have a mechanism to erase bits, so it can have a particle that's almost certainly in state 0 and then that negentropy can be sent back to the CPU for use in computations. So it sounds like you're concerned with the question of "after we erase a bit, how do we keep it from being corrupted by thermal noise before it's sent back to the CPU?" The categories of solution to this would be: 1. Keep it well isolated enough and send it back quick enough after the erasure is completed that there's no opportunity for it to be corrupted. 2. Stick up a high energy wall between the states so the bit can persist for a long time. Either one of these would be fine from my perspective, I guess you and Jacob would say that we have to go with 2, and if you have that assumption then even the simple straightforward argument depends on being able to manipulate the energy wall without cost. I do think you can manipulate energy walls without cost, though. See my discussion with Jacob, etc.
Load More