LESSWRONG
LW

MathiasKB — LessWrong

One thing to highlight, which I only learned recently, is that the norm when submitting letters to the governor on any bill in California is to include: "Support" or "Oppose" in the subject line to clearly state the company's position.

Anthropic importantly did NOT include "support" in the subject line of the second letter. I don't know how to read this as anything else than that Anthropic did not support SB1047.

-1

Replying toThe Online Sports Gambling Experiment Has Failed

MathiasKB1y

The Online Sports Gambling Experiment Has Failed

I'll crosspost the comment I left on substack:

In Denmark the government has a service (ROFUS), which anyone can voluntarily sign up for to exclude themselves from all gambling providers operating in Denmark. You can exclude yourself for a limited duration or permanently. The decision cannot be revoked.

Before discussing whether gambling should be legal or illegal, I would encourage Americans to see how far they can get with similar initiatives first.

MathiasKB1yQuick Take

Is there any good write up on the gut/brain connection and the effect fecal transplants?

Watching the South Park episode where everyone tries to steal Tom Brady's poo got me wondering why this isn't actually a thing. I can imagine lots of possible explanations, ranging from "because it doesn't have much of an effect if you're healthy" to "because FDA".

Replying toIroning Out the Squiggles

MathiasKB1y

Ironing Out the Squiggles

On this view, adversarial examples arise from gradient descent being "too smart", not "too dumb": the program is fine; if the test suite didn't imply the behavior we wanted, that's our problem.

Shouldn't we expect to see RL models trained purely on self play not to have these issues then?

My understanding is that even models trained primarily with self play, such as katago, are vulnurable to adversarial attacks. If RL models are vulnurable to the same type of adversarial attacks, isn't that evidence against this theory?

MathiasKB1y

The amount of inference compute isn't baked-in at pretraining time, so there is no tradeoff.

This doesn't make sense to me.

In a subscription based model, for example, companies would want to provide users the strongest completions for the least amount of compute.

If they estimate customers in total will use 1 quadrillion tokens before the release of their next model, they have to decide how much of the compute they are going to be dedicating to training versus inference. As one changes the parameters (subscription price, anticipated users, fixed costs for a training run, etc.) you'd expect to find the optimal ratio to change.

Test-time compute on one trace comes with a recommendation to cap

... (read more)

MathiasKB1y

Thanks!! this is exactly what I was looking for

MathiasKB1yQuick Take

With the release of openAI o1, I want to ask a question I've been wondering about for a few months.

Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?

In the release they show this chart:

The chart somewhat gets at what I want to know, but doesn't answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?

Additionally, for some x number of queries, what is the optimal ratio of compute to spend on training versus inference? How does that change for different values of x?

Are there any public attempts at estimating this stuff? If so, where can I read about it?

MathiasKB's Shortform

MathiasKB

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Replying toPoker is a bad game for teaching epistemics. Figgie is a better one.

MathiasKB2y

Poker is a bad game for teaching epistemics. Figgie is a better one.

If someone wants to set up a figgy group to play, I'd love to join

Replying toPriors and Prejudice

MathiasKB2y

Priors and Prejudice

I agree the conclusion isn't great!

Not so surprisingly, many people read the last section as an endorsement of some version of "RCTism", but it's not actually a view I endorse myself.

What I really wanted to get at in this post was just how pervasive priors are, and how difficult it is to see past them.

Replying toD&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset

MathiasKB2y

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset

Just played through it tonight. This was my first D&D.Sci, found it quite difficult and learned a a few things while working on it.

Initially I tried to figure out the best counters and found a few patterns (flamethrowers were especially good against certain units). I then tried to look and adjust for any chronology, but after tinkering around for a while without getting anywhere I gave up on that. Eventually I just went with a pretty brainless ML approach.

I ended up sending squads for 5 and 6 which managed a 13.89% and 53.15% chance of surviving, I think it's good I'm not in charge of any soldiers in real life!

Overall I had good fun, and I'm looking forward to looking at the next one.

Priors and Prejudice

MathiasKB

I

Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let’s name this hypothetical movement the Effective Samaritans.

Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping.

But many effective samaritans were starting to wonder. Is this randomista approach really the most prudent? After all, Scandinavia didn’t become wealthy and equitable through marginal charity. Societal transformation comes from uprooting oppressive power structures.

The Scandinavian societal model which lifted the working class, brought... (read 1961 more words →)

157

H5N1 - thread for information sharing, planning, and action

MathiasKB

Hi everyone,

I've been reading up on H5N1 this weekend, and I'm pretty concerned. Right now my ~~estimate~~ hunch is that there is a 5% non-zero chance that it will cost more than 10,000 people their lives.

To be clear, I think it is unlikely that H5N1 will become a pandemic anywhere close to the size of covid.

Nevertheless, I think our community should be actively following the news and start thinking about ways to be helpful if the probability increases. I am creating this thread as a place where people can discuss and share information about H5N1. We have a lot of pandemic experts in this community, do chime in!

Resources

Articles

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.3.2300001 (paper showing H5N1 has

... (read 298 more words →)

Getting GPT-3 to predict Metaculus questions

MathiasKB

Can GPT-3 predict real world events? To answer this question I had GPT-3 predict the likelihood for every binary question ever resolved on Metaculus.

Predicting whether an event is likely or unlikely to occur, often boils down to using common sense. It doesn't take a genius to figure out that "Will the sun explode tomorrow?" should get a low probability. Not all questions are that easy, but for many questions common sense can bring us surprisingly far.

Experimental setup

Through their API I downloaded every binary question posed on Metaculus.
I then filtered them down to only the non-ambiguously resolved questions, resulting in this list of 788 questions.

For these questions the community's Mean Squared Error was... (read 365 more words →)