What fraction of economically-valuable cognitive labor is already being automated today?
Did e.g. a telephone operator in 1910 perform cognitive labor, by the definition we want to use here?
Oh, indeed I was getting confused between those. So as a concrete example of your proof we could consider the following degenerate example case
def f(N: int) -> int:
if N == 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a:
return 1
else:
return 0
def check(x: int, y: float) -> bool:
return f(x) >= y
def argsat(y: float, max_search: int = 2**64) -> int or None:
# We postulate that we have this function because P=NP
if y > 1:
return None
elif y <= 0:
return 0
else:
return 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a
but we could also replace our degenerate f
with e.g. sha256
.
Is that the gist of your proof sketch?
Finding the input x
such that f(x) == argmax(f(x))
is left as an exercise for the reader though.
Is Amodei forecasting that, in 3 to 6 months, AI will produce 90% of the value derived from written code, or just that AI will produce 90% of code, by volume? It would not surprise me if 90% of new "art" (defined as non-photographic, non-graph images) by volume is currently AI-generated, and I would not be surprised to see the same thing happen with code.
And in the same way that "AI produces 90% of art-like images" is not the same thing as "AI has solved art", I expect "AI produces 90% of new lines of code" is not the same thing as "AI has solved software".
I'm skeptical.
Did the Sakana team publish the code that their scientist agent used to write the compositional regularization paper? The post says
For our choice of workshop, we believe the ICBINB workshop is a highly relevant choice for the purpose of our experiment. As we wrote in the main text, we selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning, unlike most workshops with a narrow focus on one topic.
This workshop focuses particularly on understanding limitations of deep learning methods applied to real world problems, and encourages participants to study negative experimental outcomes. Some may criticize our choice of a workshop that encourages discussion of “negative results” (implying that papers discussing negative results are failed scientific discoveries), but we disagree, and we believe this is an important topic.
and while it is true that "negative results" are important to report, "we report a negative result because our AI agent put forward a reasonable and interesting hypothesis, competently tested the hypothesis, and found that the hypothesis was false" looks a lot like "our AI agent put forward a reasonable and interesting hypothesis, flailed around trying to implement it, had major implementation problems, and wrote a plausible-sounding paper describing its failure as a fact about the world rather than a fact about its skill level".
The paper has a few places with giant red flags where it seems that the reviewer assumes that there were solid results that the author of the paper was simply not reporting skillfully, for example in section B2
,
I favor an alternative hypothesis: the Sakana agent determines where a graph belongs, what would be on the X and Y axis of that graph, what it expects that the graph would look like, and how to generate that graph. It then generates the graph and inserts the caption the graph would show if its hypothesis was correct. The agent has no particular ability to notice that its description doesn't work with the graph.
Plausibly going off into the woods decreases the median output while increasing the variance.
Has anyone trained a model to, given a prompt-response pair and an alternate response, generate an alternate prompt which is close to the original and causes the alternate response to be generated with high probability?
I ask this because
A quick search found some vaguely adjacent research, but nothing I'd rate as a super close match.
If this research really doesn't exist I'd find that really surprising, since it's a pretty obvious thing to do and there are O(100,000) ML researchers in the world. And it is entirely possible that it does exist and I just failed to find it with a cursory lit review.
Anyone familiar with similar research / deep enough in the weeds to know that it doesn't exist?
I think the ability to "just look up this code" is a demonstration of fluency - if your way of figuring out "what happens when I invoke this library function" is "read the source code", that indicates that you are able to fluently read code.
That said, fluently reading code and fluently writing code are somewhat different skills, and the very best developers relative to their toolchain can do both with that toolchain.
When triggered to act, are the homeostatic-agents-as-envisioned-by-you motivated to decrease the future probability of being moved out of balance, or prolong the length of time in which they will be in balance, or something along these lines?
I expect[1] them to have a drive similar to "if my internal world-simulator predicts a future sensory observations that are outside of my acceptable bounds, take actions to make the world-simulator predict a within-acceptable-bounds sensory observations".
This maps reasonably well to one of the agent's drives being "decrease the future probability of being moved out of balance". Notably, though, it does not map well to that the only drive of the agent, or for the drive to be "minimize" and not "decrease if above threshold". The specific steps I don't understand are
If no, they are probably not powerful agents. Powerful agency is the ability to optimize distant (in space, time, or conceptually) parts of the world into some target state
Why use this definition of powerful agency? Specifically, why include the "target state" part of it? By this metric, evolutionary pressure is not powerful agency, because while it can cause massive changes in distant parts of the world, there is no specific target state. Likewise for e.g. corporations finding a market niche - to the extent that they have a "target state" it's "become a good fit for the environment".'
Or, rather... It's conceivable for an agent to be "tool-like" in this manner, where it has an incredibly advanced cognitive engine hooked up to a myopic suite of goals. But only if it's been intelligently designed. If it's produced by crude selection/optimization pressures, then the processes that spit out "unambitious" homeostatic agents would fail to instill the advanced cognitive/agent-y skills into them.
I can think of a few ways to interpret the above paragraph with respect to humans, but none of them make sense to me[2] - could you expand on what you mean there?
And a bundle of unbounded-consequentialist agents that have some structures for making cooperation between each other possible would have considerable advantages over a bundle of homeostatic agents.
Is this still true if the unbounded consequentialist agents in question have limited predictive power, and each one has advantages in predicting the things that are salient to it? Concretely, can an unbounded AAPL share price maximizer cooperate with an unbounded maximizer for the number of sand crabs in North America without the AAPL-maximizer having a deep understanding of sand crab biology?
Subject to various assumptions at least, e.g.
Still, all those assumptions usually hold for humans
The obvious interpretation I take for that paragraph is that one of the following must be true
For clarity, can you confirm that you don't think any of the following:
None of these seem like views I'd expect you to have, so my model has to be broken somewhere
I am not one of them - I was wondering the same thing, and was hoping you had a good answer.
If I was trying to answer this question, I would probably try to figure out what fraction of all economically-valuable labor each year was cognitive, the breakdown of which tasks comprise that labor, and the year-on-year productivity increases on those task, then use that to compute the percentage of economically-valuable labor that is being automated that year.
Concretely, to get a number for the US in 1900 I might use a weighted average of productivity increases across cognitive tasks in 1900, in an approach similar to how CPI is computed
and thus I would estimate ~1% of all cognitive labor was automated in 1900. By the same methodology I would probably estimate closer to 5% for 2024.
Again, though, I am not associated with Open Phil and am not sure if they think about cognitive task automation in the same way.