Google AI PM; Foundation board member
I'm probably too conflicted to give you advice here (I work on safety at Google DeepMind), but you might want to think through, at a gears level, what could concretely happen with your work that would lead to bad outcomes. Then you can balance that against positives (getting paid, becoming more familiar with model outputs, whatever).
You might also think about how your work compares to whoever would replace you on average, and what implications that might have as well.
This is great data! I'd been wondering about this myself.
Where were you measuring air quality? How far from the stove? Same place every time?
Practicing LLM prompting?
I haven't heard the p zombie argument before, but I agree that is at least some Bayesian evidence that we're not in a sim.
Probably 3 needs to be developed further, but this is the first new piece of evidence I've seen since I first encountered the simulation argument in like 2005.
Are we playing the question game because the thread was started by Rosencranz? Is China doing well in the EV space a bad thing?
Is it the case that the tech would exist without him? I think that's pretty unclear, especially for SpaceX, where despite other startups in the space, nobody else managed to radically reduce the cost per launch in a way that transformed the industry.
Even for Tesla, which seems more pedestrian (heh) now, there were a number of years where they had the only viable car in the market. It was only once they proved it was feasible that everyone else piled in.
Progress in ML looks a lot like, we had a different setup with different data and a tweaked algorithm and did better on this task. If you want to put an asterisk on o3 because it trained in some specific way that's different from previous contenders, then basically every ML advance is going to have a similar asterisk. Seems like a lot of asterisking.
Hm I think the main thrust of this post misses something, which is that different conditions, even contradictory conditions, can easily happen locally. Obviously, it can be raining in San Francisco and sunny in LA, and you can have one person wearing a raincoat in SF and the other one the beach in LA with no problem, even if they are part of the same team.
I think this is true of wealth inequality.
Carnegie or Larry Page or Warren Buffett got their money in a non exploitative way, by being better than others at something that was extremely socially valuable. Part of what enables that is living in a society where capital is allocated by markets and there are clear price signals.
But many places in the world are not like this. Assad and Putin amassed their wealth via exploitative and extractive means. Wealth at the top in their societies is a tool of oppression.
I think this geographic heterogeneity implies that you should have one kind of program in the US (e.g. with about market failures with goods with potentially very high negative externalities like advanced AI) and another in e.g. Uganda where direct cash transfers (if you are careful to ensure they don't get expropriated by whenever the local oppressors are) could be very high impact.
It seems very strange to me to say that they cheated, when the public training set is intended to be used exactly for training. They did what the test specified! And they didn't even use all of it.
The whole point of the test is that some training examples aren't going to unlock the rest of it. What training definitely does it teach the model how to output the JSON in the right format, and likely how to think about what to even do with these visual puzzles.
Do we say that humans aren't a general intelligence even though for ~all valuable tasks, you have to take some time to practice, or someone has to show you, before you can do it well?
Gemini V2 (1206 experimental which is the larger model) one boxes, so.... progress?