I'm somewhat surprised to see the distribution of predictions for 75% on FrontierMath. Does anyone want to bet money on this, at say, 2:1 odds (my two dollars that this won't happen against your one that it will)?
(Edit: I guess the wording doesn’t exclude something like AlphaProof, which I wasn’t considering. I think I might bet 1:1 odds if systems targeted at math are included, as opposed to general purpose models?)
I think you've already given several examples:
...Should I count the people I spoke to for 15 minutes for free at the imbue potlucks? That was year-changing for at least one. But if I count them I have to count all of the free people ever, even those who were uninvested. Then people will respond “Okok, how many bounties have you taken on?” Ok sure, but should I include the people who I told “Your case is not my specialty, idk if i’ll be able to help, but I'm interested in trying for a few hours if you’re into it”? Should I include the people who had an amazing
Please, tell me what metric I should use here!
Is it feasible to just generate a bunch of such metrics, with details about what was included or not included in a particular number, and share all of them?
Hazarding a guess from the frame of 'having the most impact' and not of 'doing the most interesting thing':
Additional major epidemics or scares that didn’t pan out ($50 for first few, $25 for later)
2014-15 HPAI outbreak in the US, which didn't ultimately make it to humans
I want to add two more thoughts to the competitive deliberate practice bit:
Another analogy for the scale of humanity point:
If you try to get better at something but don't have the measuring sticks of competitive games, you end up not really knowing how good you objectively are. But most people don't even try to get better at things. So you can easily find yourself feeling like whatever local optimum you've ended up in is better than it is.
I don't know anything about martial arts, but suppose you wanted to get really good at fighting people. Then an a...
It is a bit early to tell and seems hard to accurately measure, but I note some concrete examples at the end.
Concrete examples aside, in plan making it's probably more accurate to call it purposeful practice than deliberate practice, but it seems super clear to me that in ~every place where you can deliberately practice, deliberate practice is just way better than whatever your default is of "do the thing a lot and passively gain experience". It would be pretty surprising to me if that mostly failed to be true of purposeful practice for plan making or other metacognitive skills.
...As a concrete example, as far as I can piece together from various things I have heard, Open Phil does not want to fund anything that is even slightly right of center in any policy work. I don't think this is because of any COIs, it's because Dustin is very active in the democratic party and doesn't want to be affiliated with anything that is even slightly right-coded. Of course, this has huge effects by incentivizing polarization of AI policy work with billions of dollars, since any AI Open Phil funded policy organization that wants to engage with people
Yep, my model is that OP does fund things that are explicitly bipartisan (like, they are not currently filtering on being actively affiliated with the left). My sense is in-practice it's a fine balance and if there was some high-profile thing where Horizon became more associated with the right (like maybe some alumni becomes prominent in the republican party and very publicly credits Horizon for that, or there is some scandal involving someone on the right who is a Horizon alumni), then I do think their OP funding would have a decent chance of being jeopar...
Do you have any data on whether outcomes are improving over time? For example, % published / employed / etc 12 months after a given batch
I agree! This is mostly focused on the "getting a job" part though, which typically doesn't end up testing those other things you mention. I think this is the thing I'm gesturing at when I say that there are valid reasons to think that the software interview process feels like it's missing important details.
This might look like building influence / a career in the federal orgs that would be involved in nationalization, rather than a startup. Seems like positioning yourself to be in charge of nationalized projects would be the highest impact?
Your GitHub link is broken, it includes the period in the url.
I
Love
Interesting
Alignment
Donferences
ah that makes sense thanks
I spoke with some people last fall who were planning to do this, perhaps it's the same people. I think the idea (at least, as stated) was to commercialize regulatory software to fund some alignment work. At the time, they were going by Nomos AI, and it looks like they've since renamed to Norm AI.
+ the obvious fact that it might matter to the kid that they're going to die
(edit: fwiw I broadly think people who want to have kids should have kids)
I'm sure this varies by kid, but I just asked my two older kids, age 9 and 7, and they both said they're very glad that we decided to have them even if the world ends and everyone dies at some point in the next few years.
Which makes lots of sense to me: they seem quite happy, and it's not surprising they would be opposed to never getting to exist even if it isn't a full lifetime.
Hmm, I have exactly one idea. Are you pressing shift+enter to new line? For me, if I do shift+enter
>! I don't get a spoiler
But if I hit regular enter then type >!, the spoiler tag pops up as I'm typing (don't need to wait to submit the question for it to appear)
Are you thinking of
Until Dawn?
(also it seems like I can get a spoiler tag to work in comments by starting a line with >! but not by putting text into :::spoiler [text] :::)
Interesting, thanks for the detailed responses here and above!
Here's a handwavy attempt from another angle:
Suppose you have a container of gas and you can somehow run time at 2x speed in that container. It would be obvious that from an external observer's point of view (where time is running at 1x speed) that sound would appear to travel 2x as fast from one end of the container to the other. But to the external observer, running time at 2x speed is indistinguishable from doubling the velocity of each gas molecule at 1x speed. So increasing the velocity of molecules (and therefore the temperature) should cause sound t...
If I make the room bigger or smaller while holding T and P constant, v(sound) does not change. If it did, it would be very obvious in daily life.
This feels a bit too handwavy to me, I could say the same thing about temperature: if the speed of sound were affected by making a room hotter or colder, it would be very obvious in daily life, therefore the speed of sound doesn't depend on temperature. But it isn't obvious in daily life that the speed of sound changes based on temperature either.
...So now let's increase T. It doesn't matter what effect this has on P
Worth noting that the scam attempt failed. We keep hearing ‘I almost fell for it’ and keep not hearing from anyone who actually lost money.
Here's a story where someone lost quite a lot of money through an AI-powered scam:
https://www.reuters.com/technology/deepfake-scam-china-fans-worries-over-ai-driven-fraud-2023-05-22/
We can question things, how it went this way or why we are all here with this problem now - but it does not in add anything IMHO.
I think it adds something. It's a bit strongly worded, but another way to see this is "could we have done any better, and if so, why?" Asking how we could have done better in the past lets us see ways to do better in the future.
This post comes to mind as relevant: Concentration of Force
The effectiveness of force application often depends on its concentration—on whether you can amass locally superior force at the actual decisive moment.
As someone who is definitely not a political expert (and not from or super familiar with the UK), my guess would be that you just can't muster up enough political capital or will to try again. Taxpayer money (in the US at least) seems highly scrutinized, you typically can't just fail with a lot of money and have no one say anything about it.
So then if the first try does fail, then it requires more political capital to push for allocating a bunch of money again, and failing again looks really bad for anyone who led or supported that effort. Politician...
Is it possible to purchase the 2018 annual review books anywhere? I can find an Amazon link for the 2019 in stock, but the 2018 is out of stock (is that indefinite?).
Re: "up-skilling": I think this is underestimating the value of developing maturity in an area before trying to do novel research. These are two separate skills, and developing both simultaneously from scratch doesn't seem like the fastest path to proficiency to me. Difficulties often multiply.
There is a long standing certification for "proving you've learned to do novel research", the PhD. A prospective student would find it difficult to enter a grad program without any relevant coursework, and it's not because those institutions think they have equal chances of success as a student who does.
I think it's more fair to say humans were "trained" over millions of years of transfer learning, and an individual human is fine tuned using much less data than Chinchilla.
Can we join the race to create dangerous AGI in a way that attempts to limit the damage it can cause, but allowing it to cause enough damage to move other pivotal acts into the Overton window?
If the first AGI created is designed to give the world a second chance, it may be able to convince the world that a second chance should not happen. Obviously this could fail and just end the world earlier, but it would certainly create a convincing argument.
In the early days of the pandemic, even though all the evidence was there, virtually no one cared about covid until it was knocking on their door, and then suddenly pandemic preparedness seemed like the most obvious thing to everyone.
Yeah I think I would still make this bet. I think I would still count o3's 25% for the purposes of such a bet.