if AIs were completing 1 month long self contained software engineering tasks (e.g. what a smart intern might do in the first month)
This doesn't seem like a good example to me.
The sort of tasks we're talking about are extrapolations of current benchmark tasks, so it's more like: what a programming savant with almost no ability to interact with colleagues or search out new context might do in a month given a self-contained, thoroughly specced and vetted task.
I expect current systems will naively scale to that, but not to the abilities of an arbitrary intern because that requires skills that aren't tested in the benchmarks.
Some people have strong negative priors toward AI in general.
When the GPT-3 API first came out, I built a little chatbot program to show my friends/family. Two people (out of maybe 15) flat out refused to put in a message because they just didn't like the idea of talking to an AI.
I think it's more of an instinctual reaction than something thought through. There's probably a deeper psychological explanation, but I don't want to speculate.
Rather than having objective standards, I find a growth-centric approach to be most effective. Optimizing for output is easy to Goodheart, so as much as possible I treat more as a metric than a goal. It's important that I'm getting more done now than I was a year ago, for example, but I don't explicitly aim for a particular output on a day-to-day basis. Instead I aim to optimize my processes and improve my skills, which leads to increased output. That applies not just to good work performance, but many things.
score = (expected output * expected value of work per unit of output) / funding required
for each person that needs funding, sort the list in descending order, and allocate funding in order from top to bottom. You don't need to fully solve the knapsack problem here, because leftover funding can be carried over.From one perspective, nature does kind of incentivize cooperation in the long term. See The Goddess of Everything Else.
A quick Google search of probe tuning doesn't turn up anything. Do you have more info on it?
Probe-tuning doesn't train on LLM's own "original rollouts" at all, only on LLM's activations during the context pass through the LLM.
This sounds like regular fine tuning to me. Unless you mean that the loss is calculated based on one (multiple?) of the network's activations rather than on the output logits.
Edit: I think I get what you mean now. You want to hook a probe to a model and fine-tune it to perform well as a probe classifier, right?
It's also possible that there is some elegant, abstract "intelligence" concept (analogous to arithmetic) which evolution built into us but we don't understand yet and from which language developed. It just turns out that if you already have language, it's easier to work backwards from there to "intelligence" than to build it from scratch.
Then we'll need a "thought process tampering awareness" evaluation.