Top postsTop post
zef
Message
Founder, agent infrastructure and interfaces at Fulcrum Research, https://fulcrum.inc
Past: MIT, MAIA, theory of deep learning research
More info: https://uzpg.me
219
15
6
> "Knowing is not enough; we must apply. Willing is not enough; we must do." > > — Johann Wolfgang von Goethe In our previous post, we introduced inverse rubric optimization (IRO): tasks where an agent must learn the preferences of a black-box judge under a label budget. These are...
Thanks to Megan Kinniment for helpful comments and discussion, and to Jean-Stanislas Denain for helpful comments and pointers to past work. TL;DR: We claim that useful task attributes for forecasting AI capabilities should be measurable, interpretable, stable in its trend over time, and sufficient to explain task difficulty. task.human_completion_time (human...
Thanks to Megan Kinniment for helpful comments and discussion. TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or...
Software is made of information flows Software encodes information flows. An ERP system, for instance, takes procurement and locks it into a specific sequence of purchase orders, approval routing, invoice matching, and payment release. Git takes multiple people changing code and imposes a protocol of branching, diffing, reviewing, and merging....
Why did software change the world? In the 1900s, much of the work being done by knowledge workers was computation: searching, sorting, calculating, tracking. Software made this work orders of magnitude cheaper and faster. Naively, one might expect businesses and institutions to carry out largely the same processes, just more...
We are in the time of new human-ai interfaces. AIs become the biggest producers of tokens, and humans need ways to manage all this useful labor. Most breakthroughs come first in coding, because the coders build the tech and iterate on how good it is at the same time, very...