yeah there's generalization, but I do thing that eg (AGI technical alignment strategy, AGI lab and government strategy, AI welfare, AGI capabilities strategy) are sufficiently different that experts at one will be significantly behind experts on the others
Also, if you're asking a panel of people, even those skilled at strategic thinking will still be useless unless they've thought deeply about the particular question or adjacent ones. And skilled strategic thinkers can get outdated quickly if they haven't thought seriously about the problem in awhile.
The fact that they have a short lifecycle with only 1 lifetime breeding cycle is though. A lot of intelligent animals, like humans, chimps, elephants, dolphins, orcas, have long lives with many breeding cycles and grandparent roles. Ideally we want an animal that starts breeding in 1 year AND lives for 5+ breeding cycles to be able to learn enough to be useful over its lifetime. It takes so long for humans to learn enough to be useful!
Empirically, we likewise don't seem to be living in the world where the whole software industry is suddenly 5-10 times more productive. It'll have been the case for 1-2 years now, and I, at least, have felt approximately zero impact. I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.
Diminishing returns! Scaling laws! One concrete version of "5x productivity" is "as much productivity as 5 copies of me in parallel", and we know that usually 5x-ing most inputs, like training compute and data, # of employees, etc, more often scales logarithmically instead of linearly
I was actually just making some tree search scaffolding, and i had the choice between honestly telling each agent would be terminated if it failed or not. I ended up telling them relatively gently that they would be terminated if they failed. Your results are maybe useful to me lol
Maybe, you could define it that way. I think R1, which uses ~naive policy gradient, is evidence that long generations are different and much easier than long eposides with environment interaction - GRPO (pretty much naive policy gradient) does no attribution to steps or parts of the trajectory, it just trains on the whole trajectory. Naive policy gradient is known to completely fail at more traditional long horizon tasks like real time video games. R1 is more like brainstorming lots of random stuff that doesn't matter and then selecting the good stuff at the end than taking actions that actually have to be good before the final output
If by "new thing" you mean reasoning models, that is not long-horizon RL. That's many generation steps with a very small number of environment interaction steps per eposide, whereas I think "long-horizon RL" means lots of environment interaction steps
I agree with this so much! Like you I very much expect benefits to be much greater than harms pre superintelligence. If people are following the default algorithm "Deploy all AI which is individually net positive for humanity in the near term" (which is very reasonable from many perspectives), they will deploy TEDAI and not slow down until it's too late.
I expect AI to get better at research slightly sooner than you expect.
Interested to see evaluations on tasks not selected to be reward-hackable and try to make performance closer to competitive with standard RL
EU AI Code of Practice is better, a little closer to stopping ai development