LESSWRONG
LW

All of Jonathan_Graehl's Comments + Replies

Looking forward to Elon's upcoming book, "IF I did it: confessions of a system prompter"
Elon is right about South Africa but foolish to patch it in prompt. Instead, think training data update weights.
This nano-scandal is similarly as embarrassing as the fake Path of Exile 2 account fiasco (which he did cop to eventually). Elon is doing such great works; why must he also micro-sin?

Neural networks generalize because of this one weird trick

I'm unclear on whether the 'dimensionality' (complexity) component to be minimized needs revision from the naive 'number of nonzeros' (or continuous but similar zero-rewarded priors on parameters).

Either:

the simplest equivalent (by naive score) 'dimensonality' parameters are found by the optimization method, in which case what's the problem?
not. then either there's a canonicalization of the equivalent onto- parameters available that can be used at each step, or an adjustment to the complexity score that does a good job doing so, or we can't figure it out and we risk our optimization methods getting stuck in bad local grooves because of this.

Does this seem fair?

1Jesse Hoogland2y

Let me see if I understand your question correctly. Are you asking: does the effective dimensionality / complexity / RLCT (λ) actually tell us something different from the number of non-zero weights? And if the optimization method we're currently using already finds low-complexity solutions, why do we need to worry about it anyway? So the RLCT tells us the "effective dimensionality" at the largest singularity. This is different from the number of non-zero weights because there are other symmetries that the network can take advantage of. The claim currently is more descriptive than prescriptive. It says that if you are doing Bayesian inference, then, in the limiting case of large datasets, this RLCT (which is a local thing) ends up having a global effect on your expected behavior. This is true even if your model is not actually at the RLCT. So this isn't currently proposing a new kind of optimization technique. Rather, it's making a claim about which features of the loss landscape end up having most influence on the training dynamics you see. This is exact for the case of Bayesian inference but still conjectural for real NNs (though there is early supporting evidence from experiments).

Neural networks generalize because of this one weird trick

Jonathan_Graehl2y30

This appears to be a high-quality book report. Thanks. I didn't see anywhere the 'because' is demonstrated. Is it proved in the citations or do we just have 'plausibly because'?

Physics experiences in optimizing free energy have long inspired ML optimization uses. Did physicists playing with free energy lead to new optimization methods or is it just something people like to talk about?

7Jesse Hoogland2y

The because ends up taking a few dozen pages to establish in Watanabe 2009 (and only after introducing algebraic geometry, empirical processes, and a bit of complex analysis). Anyway, I thought it best to leave the proof as an exercise for the reader. I'm not quite sure what you're asking. Like you say, physics has a long history of inspiring ML optimization techniques (e.g., momentum/acceleration and simulated annealing). Has this particular line of investigation inspired new optimization techniques? I don't think so. It seems like the current approaches work quite well, and the bigger question is: can we extend this line of investigation to the optimization techniques we're currently using?

On Cooking With Gas

Jonathan_Graehl2y-2-6

This kind of reply is ridiculous and insulting.

2[anonymous]2y

You may have been socially taught to believe that - to trust your direct opinions or the median from your direct friends - but in terms of rationality - the philosophy of being the least wrong - your strategy is suboptimal. Only by collecting big n data can you ever really "know" anything. So you may find it emotionally insulting but at the end of the day correctness matters.

Scaling laws vs individual differences

Jonathan_Graehl2y31

We have good reason to suspect that biological intelligence, and hence human intelligence roughly follow similar scaling law patterns to what we observe in machine learning systems

No, we don't. Please state the reason(s) explicitly.

1beren2y

I'm basing my thinking here primarily off of Herculano Houzel's work. If you have reasons you think this is wrong or counterarguments, I would be very interested in them as this is a moderately important part of my general model of AI.

Google Search loses to ChatGPT fair and square

Jonathan_Graehl2y20

Google's production search is expensive to change, but I'm sure you're right that it is missing some obvious improvements in 'understanding' a la ChatGPT.

One valid excuse for low quality results is that Google's method is actively gamed (for obvious $ reasons) by people who probably have insider info.

IMO a fair comparison would require ChatGPT to do a better job presenting a list of URLs.

Sparse trinary weighted RNNs as a path to better language model interpretability

Jonathan_Graehl3y20

how is a discretized weight/activation set amenable to the usual gradient descent optimizers?

2Am8ryllis3y

Discretized weights/activation are very much not amenable to the usual gradient descent. :) Hence the usual practice is to train in floating point, and then quantize afterwords. Doing this naively tends to cause a big drop in accuracy, but there are tricks involving gradually quantizing during training, or quantizing layer by layer.

Argument against 20% GDP growth from AI within 10 years [Linkpost]

Jonathan_Graehl3y20

You have the profits from the AI tech (+ compute supporting it) vendors and you have the improvements to everyone's work from the AI. Presumably the improvements are more than the take by the AI sellers (esp. if open source tools are used). So it's not appropriate to say that a small "sells AI" industry equates to a small impact on GDP.

But yes, obviously GDP growth climbing to 20% annually and staying there even for 5 years is ridiculous unless you're a takeoff-believer.

Taking the parameters which seem to matter and rotating them until they don't

Jonathan_Graehl3y51

You don't have to compute the rotation every time for the weight matrix. You can compute it once. It's true that you have to actually rotate the input activations for every input but that's really trivial.

Taking the parameters which seem to matter and rotating them until they don't

Jonathan_Graehl3y20

Interesting idea.

Obviously doing this instead with a permutation composed with its inverse would do nothing but shuffle the order and not help.

You can easily do the same with any affine transformation, no? Skew, translation (scale doesn't matter for interpretability).

More generally if you were to consider all equivalent networks, tautologically one of them is indeed more input activation => output interpretable by whatever metric you define (input is a pixel in this case?).

It's hard for me to believe that rotations alone are likely to give much improvem... (read more)

7Garrett Baker3y

You can do this with any normalized, nonzero, invertible affine transformation. Otherwise, you either get the 0 function, get a function arbitrarily close to zero, or are unable to invert the function. I may end up doing this. This will not provide any improvement in training, for various reasons, but mainly because I anticipate there's a reason the network is not in the interpretable basis. Interpretable networks do not actually increase training effectiveness. The real test of this method will be in my attempts to use it to understand what my MNIST network is doing.

Is population collapse due to low birth rates a problem?

Answer by Jonathan_GraehlAug 26, 202200

If human lives are good, depopulation should not be pursued. If instead you only value avg QOL, there are many human lives you'd want to prevent. But anyone claiming moral authority to do so should be intensely scrutinized.

Is population collapse due to low birth rates a problem?

Answer by Jonathan_GraehlAug 26, 202231

To sustain high tech-driven growth rates, we probably need (pre-real-AI) an increasing population of increasingly specialized and increasingly long-lived researchers+engineers at every intelligence threshold - as we advance, it takes longer to climb up on giants' shoulders. It's unclear what the needs are for below-threshold population (not zero, yet). Probably Elon is intentionally not being explicit about the eugenic-adjacent angle of the situation.

What's up with the bad Meta projects?

Jonathan_Graehl3y50

IMO this project needs an aesthetic leader. A bunch of technically competent people building tools they think might be useful is very likely to result in a bunch of unappealing stuff no one wants.

What's up with the bad Meta projects?

Jonathan_Graehl3y82

In Carmack's recent 5+hr interview on Lex Friedman [1], he points out that finding a particular virtual setting that people love and focusing effort on that is usually how we arrive at games/spaces that have historically driven hardware/platform adoption, and that Zucc is very obviously not doing that. The closest successful virtual space to Zucc's approachis Roblox, a kind of social game construction kit (with pretty high market cap), but in his opinion the outcome is usually you build it and they don't come. I believe Carmack also favors the technical re... (read more)

The longest training run

Jonathan_Graehl3y121

This is good thinking. Breaking out of your framework: trainings are routinely checkpointed periodically to disk (in case of crash) and can be resumed - even across algorithmic improvements in the learning method. So some trainings will effectively be maintained through upgrades. I'd say trainings are short mostly because we haven't converged on the best model architectures and because of publication incentives. IMO benefitting from previous trainings of an evolving architecture will feature in published work over the next decade.

2jacob_cannell3y

Was going to make nearly the same comment, so i'll just add to yours: an existing training run can benefit from hardware/software upgrades nearly as much as new training runs. Big changes to hardware&software are slow relative to these timescales. (Nvidia releases new GPU architectures on a two year cadence, but they are mostly incremental). New training runs benefit most from major architectural changes and especially training/data/curriculum changes.

Sexual Abuse attitudes might be infohazardous

Jonathan_Graehl3y93

One of the reasons abusers of kids/teens aren't fully prosecuted is because parents of victims rightly predict that everyone knowing you were raped by the babysitter or whatever will generate additional psych baggage and selfishly refrain from protecting other children from the same predator.

Donohue, Levitt, Roe, and Wade: T-minus 20 years to a massive crime wave?

Jonathan_Graehl3y20

How are we ever supposed to believe that enough variables were 'controlled for'?

More abortions -> [lag 15 years] less crime is of course plausible. We should expect smaller families produced by abortion to have more resources available for the surviving children, if any, which plausibly could reduce their criminality. But the hypothesis is clearly also motivated by a belief that we should hope genetically criminal-inclined people differentially have most of the abortions (though I'm sure this motivation is not forefronted by authors).

Looking back on my alignment PhD

Jonathan_Graehl3y154

Congrats on the accomplishments. Leaving aside the rest, I like the prompt: why don't people wirehead? Realistically, they're cautious due to having but one brain and a low visibility into what they'd become. A digital-copyable agent would, if curious about what slightly different versions of themselves would do, not hesitate to simulate one in a controlled environment.

Generally I would tweak my brain if it would reliably give me the kind of actions I'd now approve of, while providing at worst the same sort of subjective state as I'd have if managing the s... (read more)

1Aleksey Bykhun2y

Technically, we do this all the time. Reading stuff online, talking to people, we absorb their models of the world, their values and solutions to problems we face. Hence the Schwartznegger poster on the wall makes you strong, the countryside folks make you peaceful, and friend reminding you "you're being a jerk right now" makes you calm down

2Nathan Helm-Burger3y

I like your comment and think it's insightful about why/when to wirehead or not Nitpick about your endorsed skills point: Not always do people have high overlap in what they know and what they wish they knew or endorse others knowing. I've had a lifelong obsession with learning, especially with acquiring skills. Unfortunately, my next-thing-to-learn selection is very unguided. It has thus been thematic struggle in my life to keep focused on learning the things I judge to objectively be valuable. I have a huge list of skills/hobbies I think are mostly or entirely impractical or useless (e.g. artistic woodworking, paleontology). And also lots of things I've been thinking for years that I ought to learn better (e.g. linear algebra). I've been wishing for years that I had a better way to reward myself for studying things I reflectively endorse knowing, rather than wasting time/energy studying unendorsed things. In other words, I'd love a method (like Max Harms' fictional Zen Helmets) to better align my system 1 motivations to my system 2 motivations. The hard part is figuring out how to implement this change without corrupting the system 2 values or its value-discovery-and-updating processes.

Failing to fix a dangerous intersection

Jonathan_Graehl3y556

LA has a tradition of guerrilla freeway sign enhancements as a result of similar authority non-responsiveness. http://www.slate.com/blogs/the_eye/2015/02/11/guerrilla_public_service_on_99_invisible_richard_ankrom_replaced_a_los_angeles.html

6calef3y

You could probably implement this change for less than $5,000 and with minimal disruption to the intersection if you (for example) repainted the lines over night / put authoritative cones around the drying paint. Who will be the hero we need?

Why I don't believe in doom

Jonathan_Graehl3y20

A general superhuman AI motivated to obtain monopoly computational power could do a lot of damage. Security is hard. Indeed we'd best lay measures in advance. 'Tool' (we hope) AI will unfortunately have to be part of those measures. There's no indication we'll see provably secure human-designed measures built and deployed across the points of infrastructure/manufacturing leverage.

Why I don't believe in doom

Jonathan_Graehl3y30

Most people agree with this, of course, though perhaps not most active users here.

Georgism, in theory

Jonathan_Graehl3y20

Where's the model?

2Stuart_Armstrong3y

Deadweight loss of taxation with perfectly inelastic supply (ie no deadweight loss at all) and all the taxation allocated to the inelastic supply: https://en.wikipedia.org/wiki/Deadweight_loss#How_deadweight_loss_changes_as_taxes_vary I added a comment on that in the main body of the post.

Explaining the Twitter Postrat Scene

Jonathan_Graehl3y40

Consider also that activities you find you enjoy, such as LW or Twitter posting, are likely to be judged by you as more useful than they are. Agree that LW-style is not the only one to think in. Authors here could give more weight to being easily understood than showing off.

Explaining the Twitter Postrat Scene

Jonathan_Graehl3y30

I liked your Qanon-feminist tweet, but we have to remember that something that upsets people by creating dissonance around the mistake you intend (even if they can't pin down the intent) is not as good as actually correcting the mistake. It's certainly easier to create an emotionally jarring contrast around a mistaken belief than to get people to understand+accept an explicit correction, so I can see why you'd enjoy creating the easy+viral.

They Don’t Know About Second Booster

Jonathan_Graehl3y70

I haven't seen booster net efficacy assessed in an honest way, since they often exclude events for the first 2 weeks post-boost. Agree that we should expect a small effect only; I would approve for whoever wants and leave it at that.

Why is the war in Ukraine particularly dangerous for the world?

Jonathan_Graehl3y40

While I lived through and can confirm the prevlance of the 'extinguish all civilization' MAD narrative, I wonder today how extinguished it actually would have been. (famine due to a year of reduced sunlight from dust floating around was part of the story)

3ryan_b3y

My understanding is that at least the United States considered this problem, and made adjustments for it. The nuclear winter problem is much worse for ground detonations, which I already mentioned; air bursts have less impact, while simultaneously having a much more powerful EMP effect. As electronics became more important over time, the latter weighed much more heavily in American thinking on the subject. There was also a general shift towards precision in American weapons development, which included nuclear weapons. This is the line of research that lead to tactical nuclear weapons, which have the benefits of fewer side effects like nuclear winter, or killing our own troops, etc. As a consequence my impression is that the everything-except-microbes-dies scenario was never likely, even in the worst period. On the other hand, I now think governments and the attendant international system are quite a bit more fragile; so a general descent into bloody anarchy and the simultaneous loss of civilization’s high achievements requires much less damage to achieve.

1Radford Neal3y

Well, I lived through that time to. And there was much about not just civilization, but all of humanity, being extinguished (eg, the novels On the Beach and Level 7). However, though I recall as a teenager thinking that nuclear war was quite likely, and that it would be catastrophic, I did not think (like many did/do) that every last human would die in a nuclear war. That was too obviously contrary to physical intuition. So, there was a lot of `extinguish all civilization' narrative. But nevertheless, I don't think it was the official line - that was about retaliating by nuking all the Russian military installations. And I think it's quite believable that that really was the policy. If US bases and/or cities have been nuked, it makes sense to try to make sure the Russians don't follow up with an occupying army. It doesn't make sense to also try to kill vast numbers of Russian civilians (though many would die anyway, of course).

Lives of the Cambridge polymath geniuses

Jonathan_Graehl3y20

https://aviation.stackexchange.com/questions/75411/was-ludwig-wittgensteins-aircraft-propeller-ever-built imaginative I suppose. Why is Wittgenstein thought to have contributed anything of worth? Yes, he was clever. Yes, some of his contemporaries praised him.

2Owain_Evans3y

Is this a rhetorical question? What kind of evidence are you looking for? At this point, it's more efficient to learn about Wittgenstein's contributions by reading more recent works. If you wanted some intro material on Wittgenstein's own work, you could try SEP, Grayling, or Soames [detailed historical development of analytic philosophy] but I haven't looked at these myself. Also any discussions by Dennett of Wittgenstein on philosophy of mind, Kripke (or McGinn's discussion) on Wittgenstein on rule-following, discussion of family resemblance for concepts in various works.

Michael Edward Johnson3y100

Dennett talks about Darwin’s theory of evolution being a “universal acid” that flowed everywhere, dissolved many incorrect things, and left everything we ‘thought’ we knew forever changed. Wittgenstein’s Philosophical Investigations, with its description of language-games and the strong thesis that this is actually the only thing language is, was that for philosophy. Before PI it was reasonable to think that words have intrinsic meanings; after, it wasn’t.

I have COVID, for how long should I isolate?

Answer by Jonathan_GraehlJan 13, 202240

sniffles don't matter; 10 days after fever's end seems generous/considerate. allegedly positive nasal swab antigen tests will persist for days after it's impossible to lab-culture the virus from a snot sample but in any case such tests are definitely negative after 14 days of onset

1iceplant3y

Good to know. I wasn't able to take my temperature, but I felt subjectively "feverish" with mild body aches and bizarre dreams on days 1 and 2 and not after. 10 days after that would put me at 12 days of total isolation. If you have any first or second hand sources you can share I'd love to check them out, but I understand if you can't.

The Speed of Rumor

Jonathan_Graehl4y20

Aren't rumors typically rounded up for impact in the fashion you caught this someone doing by luck of existing direct knowledge?

6NancyLebovitz4y

Yes. Now how do we sieve good information out of this environment?

COVID Skepticism Isn't About Science

Jonathan_Graehl4y20

Poll inadequancy: zero is not right, but I think the answer to P(hospitalized|covid) is <1%

0cistrane4y

That depends on age and comorbidities. That probability is highly stratified. There are some population where P(hospitalized|covid) is >5%

Help figuring out my sexuality?

Jonathan_Graehl4y20

Do you like strip clubs?

Help figuring out my sexuality?

Answer by Jonathan_GraehlDec 22, 202110

Sounds like you've imprinted some sort of not exactly resentment+rejection of the power+value of female sexuality (as I think some gay men have) but rather frustrated worship+submission to it, congruent with high porn consumption, although you say you don't actually consume much since the out and about the powerless man ogling/frustration stimulus is enough.
This voyeurish mode and esp. the powerlessness arousal fetish doesn't help you pose as the typically high-value 'prize' so the lack of access isn't surprising. As an unsolicited prescription, I'd ... (read more)

2Jonathan_Graehl4y

Do you like strip clubs?

Let's buy out Cyc, for use in AGI interpretability systems?

Jonathan_Graehl4y30

Cyc's human-input 'knowledge' would indeed be an interesting corpus, but my impression has always been that nothing really useful has come of it to date. I wouldn't pay much for access.

Frame Control

Jonathan_Graehl4y00

I'm not seeing any difference between pressure and aggression these days.

Why don't our vaccines get updated to the delta spike protein?

Jonathan_Graehl4y20

Trying to push out a revision costs money and doesn't earn any expected money. And everyone knows this is so. Unofficial market collusion regularly manages to solve harder problems; you don't need explicit comms at all.

I'll grant that we'll hear some competitive "ours works better on variant X" marketing but a new even faster approval track would be needed if we really wanted rapid protein updates.

As evhub mentions, the antibodies you make given the first vaccine you're exposed to are what will get manufactured every time you see a similar-enough provocati... (read more)

Paxlovid Remains Illegal: 11/24 Update

Jonathan_Graehl4y50

Why are you quoting without correction someone who thinks 5 billion divided by 10 million is 500,000 (it's 500)?

Zvi4y100

Um, I was assuming everyone would understand that this wasn't correct? Are we so fargone I can't do this?

Insights from Modern Principles of Economics

Jonathan_Graehl4y20

presumably perfect competition defects from perfect price discrimination

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

Jonathan_Graehl4y20

'how-level' would be easier to parse

4Chris_Leong4y

Oops, I meant "high-level"

4Chris_Leong4y

"How-level"?

Is GPT-3 already sample-efficient?

Answer by Jonathan_GraehlOct 06, 202120

In general a language model will 'know' the sentence related to the single occurrence of a rare name. I don't think you learn much here if there are enough parameters available to support this memory.

Is GPT-3 already sample-efficient?

Answer by Jonathan_GraehlOct 06, 202140

Perhaps GPT-3 has more parameters than are probably needed to roughly memorize its very large training data. This would be good since the data contains some low quality garbage, false claims, etc (can think of them as 'noise'). I believe GPT-n are adding parameters faster than training data Here's my summary of a paper that suggests this is the right move:

https://www.youtube.com/watch?v=OzGguadEHOU Microsoft guy Sebastian Bubeck talking about seemingly overparameterized neural models being necessary for learning (due to label noise?). Validation 'early sto... (read more)

Covid 9/23: There Is a War

Jonathan_Graehl4y40

Two reasons you could recommend boosters for vulnerable only:

global first doses first thinking
awareness that eradicating covid by rapid vaccination to herd immunity is futile given current effectiveness+adoption and hope to reduce the mareks-like adaptation of more vax-resistant strains so that the vulnerable can have more of the benefit preserved to them

It does seem that, temporarily supply shortages aside, you should advocate universal 'vaccination' (say w/ moderna) iff you also advocate ongoing doses until a real vaccine is available.

Long Covid Is Not Necessarily Your Biggest Problem

Jonathan_Graehl4y-30

Your contrary cite notwithstanding, I predict Delta will end up less damaging on average and more cases will go uncounted due to its mildness. This may also drive some overestimation of its virulence. It does appear to spread well enough that is a question of when not if you'll be exposed.

Elizabeth4y100

What are you basing that lower virulence on?

COVID/Delta advice I'm currently giving to friends

Jonathan_Graehl4y40

agree. thanks

Pedophile Problems

Jonathan_Graehl4y10

as always the legal term 'minor' is not really germane to the topic people really care about

Pedophile Problems

Jonathan_Graehl4y-40

Everyone wants fewer of these people.

If there's a way there that involves an edit of existing people (including by invasive 'minder' future tech), fine.

Otherwise, prevent them being born or destroy them.

1Stuart Anderson4y

The obvious problem is that everyone wants none of certain kinds of people. We have the statistics about elective abortions of foetuses with Down's Syndrome. These things always start with the edge cases.

1Jonathan_Graehl4y

as always the legal term 'minor' is not really germane to the topic people really care about

Causes of a Debt Crisis—Economic

Jonathan_Graehl4y20

Holders of prepayable loans don't really benefit much when rates drop, so I'll assume you mean bond-like instruments (or ones that aren't likely to be refinanced out of, or that pay some bonus in that event).

1Connor_Flexman4y

I meant that borrowers throughout the economy can refinance at lower rates, which is better for them, which means it's easier for the economy to build new stuff.

How will OpenAI + GitHub's Copilot affect programming?

Jonathan_Graehl4y250

surely private installations of the facility will be sold to trade-secret-protecting teams

Covid vaccine safety: how correct are these allegations?

Jonathan_Graehl4y20

'If this were true, where are the lawsuits against the vaccine makers?'

Surely they've been shielded from liability so there won't be any.

3DPiepgrass4y

This is possibly outdated, but I saw a publication by "National Research Council (US) Division of Health Promotion and Disease Prevention" from 1985 stating that "A manufacturer who produces and sells a defective vaccine that creates a risk of significant injury to the recipient is liable to any person injured by that defect under the principles stated in section 402A of the Restatement of Torts. This is thought to be the law in every American jurisdiction".

2DPiepgrass4y

If that's true (??), I guess lawsuits would be directed at the FDA instead. It'd be shocking if everybody involved had immunity (against lawsuits, I mean).

3TAG4y

Throughout the world?

Often, enemies really are innately evil.

Jonathan_Graehl4y60

To me, 'evil' means 'should be destroyed if possible'. Therefore I don't like to hand out the label recklessly, as it leads generally to impotent rage, which is harmful to me.

1orthogenesis4y

Depends on if you mean by that, as shorthand that the evil (insert person or thing) must be destroyed if possible. You could get rid of something 'evil' by reforming or changing it to be 'non-evil' by whatever means, that don't involve literally annihilating it. Unless your definition of evil thing implies unreformable (don't know if that matches intuition -- I can image stories where an 'evil' villain sees the light and becomes good) and destruction is the only option.

Why has no one compared Covid-19 and Vaccine Risks?

Jonathan_Graehl4y10

Is only 1/3 of Long Covid sufferers actually having had covid definitely a thing, too? I think it is (or maybe antibody tests give many false positives?)