I'm not convinced that these were bad predictions for the most part.
The main prediction: 1) China lacks compute. 2) CCP values stability and control -> China will not be the first to build unsafe AI/AGI.
Both of these premises are unambiguously true as far as I'm aware. So, these predictions being bad suggests that we now believe China is likely to build AGI without realizing it threatens stability/control, and with minimal compute, before USA? All while refusing to agree to any sort of deal to slow down? Why? Seems unlikely.
American companies, on the other hand, are still explicitly racing toward AGI, are incredibly well resourced, have strong government support, and have a penchant for disruption. The current administration also cares less about stability than any other in recent history.
So, from my perspective, USA racing to AGI looks even more dangerous than before, almost desperate. Whereas China is fast following, which I think everyone expected? Did anyone suggest that China would not be able to fast-follow American AI?
I keep some folders (and often some other transient files) on my desktop and pin my main apps to the taskbar. With apps pinned to your taskbar, you can open a new instance with Windows+shift+num (or just Windows+num if the app isn't open yet).
I do the same as you and search for any other apps that I don't want to pin.
Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it's quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into.
Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.
But these issues seem far from insurmountable, even with current tech. It is just that they are not actually trying, because they want to limit scaffolding.
From what I've seen, the main issues:
1) Poor vision -> Can be improved through tool use, will surely improve greatly regardless with new models
2) Poor mapping -> Can be improved greatly + straightforwardly through tool use
3) Poor executive function -> I feel like this would benefit greatly from something like a separation of concerns. Currently my impression is Claude is getting overwhelmed with context, loses track of what's going on, then starts messing with its long-term planning. From a clean context, its long-term planning seems fairly decent. Same for loops, I would expect a clean-context Claude could read a summary of recent steps constituting a loop and understand that it is in a loop and that it needs to try something else.
E.g., separate contexts for each of battling, navigation, summarization, long-term planning, coordination, etc.
I interpret the main argument as:
You cannot predict the direction of policy that would result from certain discussions/beliefs
The discussions improve the accuracy of our collective world model, which is very valuable
Therefore, we should have the discussions first and worry about policy later.
I agree that in many cases there will be unforeseen positive consequences as a result of the improved world model, but in my view, it is obviously false that we cannot make good directionally-correct predictions of this sort for many X. And the negative will clearly outweigh the positive for some large group in many cases. In that case, the question is how much you are willing to sacrifice for the collective knowledge.
If you want to highlight people who handle this well, the only interesting case is people from group A in favor of discussing X where X is presumed to lead to Y and Y negatively impacts A. Piper's X has a positive impact on her beliefs (discussing solutions to falling birth-rates as one who believes it is a problem), and Caplan's X has a positive impact on him (he is obviously high IQ), so neither of these are interesting samples. There is no reason for either of these to inherently want to avoid discussing these X. Even worse, Caplan's rejected "Y" is a clear strawman, which begs the question and actually negatively updates me on his beliefs. More realistic Ys are things like IQ-based segregation, resource allocation, reproductive policies, etc.
If I reject these Ys for ideological reasons, and the middle ground looks like what I think it looks like, I do not want to expose the middle ground.
I agree with you that people like him do a service to prediction markets: contributing a huge amount of either liquidity or information. I don't agree with you that it is clear which one he is providing, especially considering the outcome. He did also win his popular vote bet, which was hovering around, I'm not sure, ~20% most of the time?
I think he (Theo) probably did have a true probability around 80% as well. That's what it looks like at least. I'm not sure why you would assume he should be more conservative than Kelly. I'm sure Musk is not, as one example of a competent risk-taker.
A few glaring issues here:
1) Does the question imply causation or not? It shouldn't.
2) Are these stats intended to be realistic such that I need to consider potential flaws and take a holistic view or just a toy scenario to test my numerical skills? If I believe it's the former and I'm confident X and Y are positively correlated, a 2x2 grid showing X and Y negatively correlated should of course make me question the quality of your data proportionally.
3) Is this an adversarial question such that my response may be taken out of context or otherwise misused?
The sample interviews from Veritasium did not seem to address any of these issues:
(1) They seemed to cut out the gun question, but the skin cream question implied causation, "Did the skin cream make the rash better or worse?"
(2) One person mentioned "I Wouldn't have expected that..." which implies he thought it was real data,
(3) the last person clearly interpreted it adversarially.
In the original study, the question was stated as "cities that enacted a ban on carrying concealed handguns were more likely to have a decrease in crime." This framing is not as bad, but still too close to implying causation in my opinion.
I do not really understand your framing of these three "dimensions". The way I see it, they form a dependency chain. If either of the first two are concentrated, they can easily cut off access during takeoff (and I would expect this). If both of the first two are diffuse, the third will necessarily also be diffuse.
How could one control AI without access to the hardware/software? What would stop one with access to the hardware/software from controlling AI?
I've updated my comment. You are correct as long as you pre-commit to a single answer beforehand, not if you are making the decision after waking up. The only reason pre-committing to heads works, though, is because it completely removes the Tuesday interview from the experiment. She will no longer be awoken on Tuesday, even if the result is tails. So, this doesn't really seem to be in the spirit of the experiment in my opinion. I suppose the same pre-commit logic holds if you say the correct response gets (1/coin-side-wake-up-count) * value per response though.
I think LW consensus has been that the main existential risk is AI development in general. The only viable long-term option is to shut it all down. Or at least slow it down as much as possible until we can come up with better solutions. DeepSeek from my perspective should incentivize slowing down development (if you agree with the fast follower dynamic. Also by reducing profit margins generally), and I believe it has.
Anyway, I don't see how this relates to these predictions. The predictions are about China's interest in racing to AGI. Do you believe China would now rather have an AGI race with USA than agree to a pause?