All of faul_sname's Comments + Replies

I am not one of them - I was wondering the same thing, and was hoping you had a good answer.

If I was trying to answer this question, I would probably try to figure out what fraction of all economically-valuable labor each year was cognitive, the breakdown of which tasks comprise that labor, and the year-on-year productivity increases on those task, then use that to compute the percentage of economically-valuable labor that is being automated that year.

Concretely, to get a number for the US in 1900 I might use a weighted average of productivity increases ac... (read more)

What fraction of economically-valuable cognitive labor is already being automated today?

Did e.g. a telephone operator in 1910 perform cognitive labor, by the definition we want to use here?

1Mo Putera
I'm mainly wondering how Open Phil, and really anyone who uses fraction of economically-valuable cognitive labor automated / automatable (e.g. the respondents to that 2018 survey; some folks on the forum) as a useful proxy for thinking about takeoff, tracks this proxy as a way to empirically ground their takeoff-related reasoning. If you're one of them, I'm curious if you'd answer your own question in the affirmative?

Oh, indeed I was getting confused between those. So as a concrete example of your proof we could consider the following degenerate example case

def f(N: int) -> int:
    if N == 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a:
        return 1
    else:
        return 0

def check(x: int, y: float) -> bool:
    return f(x) >= y

def argsat(y: float, max_search: int = 2**64) -> int or None:
    # We postulate that we have this function because P=NP
    if y > 1:
        return None
    elif y <= 0:
        return 0
    el
... (read more)

Finding the input x such that f(x) == argmax(f(x)) is left as an exercise for the reader though.

4kman
I think you're mixing up max with argmax?

Is Amodei forecasting that, in 3 to 6 months, AI will produce 90% of the value derived from written code, or just that AI will produce 90% of code, by volume? It would not surprise me if 90% of new "art" (defined as non-photographic, non-graph images) by volume is currently AI-generated, and I would not be surprised to see the same thing happen with code.

And in the same way that "AI produces 90% of art-like images" is not the same thing as "AI has solved art", I expect "AI produces 90% of new lines of code" is not the same thing as "AI has solved software".

1DAL
Yea, fair enough.  His prediction was: "I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code"  The second one is more hedged ("may be a world") but "essentially all the code" must translate to a very large fraction of all the value even if that last 1% or whatever is of outsize economic significance.  

I'm skeptical.

Did the Sakana team publish the code that their scientist agent used to write the compositional regularization paper? The post says

For our choice of workshop, we believe the ICBINB workshop is a highly relevant choice for the purpose of our experiment. As we wrote in the main text, we selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning, unlike most workshops with a narrow focus on one topic.

This workshop focuse

... (read more)

Plausibly going off into the woods decreases the median output while increasing the variance.

Has anyone trained a model to, given a prompt-response pair and an alternate response, generate an alternate prompt which is close to the original and causes the alternate response to be generated with high probability?

I ask this because

  1. It strikes me that many of the goals of interpretability research boil down to "figure out why models say the things they do, and under what circumstances they'd say different things instead". If we could reliably ask the model and get an intelligible and accurate response back, that would almost trivialize this sort of r
... (read more)

I think the ability to "just look up this code" is a demonstration of fluency - if your way of figuring out "what happens when I invoke this library function" is "read the source code", that indicates that you are able to fluently read code.

That said, fluently reading code and fluently writing code are somewhat different skills, and the very best developers relative to their toolchain can do both with that toolchain.

When triggered to act, are the homeostatic-agents-as-envisioned-by-you motivated to decrease the future probability of being moved out of balance, or prolong the length of time in which they will be in balance, or something along these lines?

I expect[1] them to have a drive similar to "if my internal world-simulator predicts a future sensory observations that are outside of my acceptable bounds, take actions to make the world-simulator predict a within-acceptable-bounds sensory observations".

This maps reasonably well to one of the agent's drives being... (read more)

2Thane Ruthenis
That was never the argument. A paperclip-maximizer/wrapper-mind's utility function doesn't need to be simple/singular. It can be a complete mess, the way human happiness/prosperity/eudaimonia is a mess. The point is that it would still pursue it hard, so hard that everything not in it will be end up as collateral damage. I think humans very much do exhibit that behavior, yes? Towards power/money/security, at the very least. And inasmuch as humans fail to exhibit this behavior, they fail to act as powerful agents and end up accomplishing little. I think the disconnect is that you might be imagining unbounded consequentialist agents as some alien systems that are literally psychotically obsessed with maximizing something as conceptually simple as paperclips, as opposed to a human pouring their everything into becoming a multibillionaire/amassing dictatorial power/winning a war? Yes, see humans.

In particular, if human intelligence has been at evolutionary equilibrium for some time, then we should wonder why humanity didn't take off 115k years ago, before the last ice age.

Yes we should wonder that. Specifically, we note

  • Humans and chimpanzees split about 7M years ago
  • The transition from archaic to anatomically modern humans was about 200k years ago
  • Humans didn't substantially develop agriculture before the last ice age started 115k years ago (we'd expect to see archaeological evidence in the form of e.g. agricultural tools which we don't see, while w
... (read more)

If humans are the stupidest thing which could take off, and human civilization arose the moment we became smart enough to build it, there is one set of observations which bothers me:

  • The Bering land bridge sank around 11,000 BCE, cutting off the Americas from Afroeurasia until the last thousand years.
  • Around 10,000 BCE, people in the Fertile Crescent started growing wheat, barley, and lentils.
  • Around 9,000-7,000 BCE, people in Central Mexico started growing corn, beans, and squash.
  • 10-13 separate human groups developed farming on their own with no contact
... (read more)
9johnswentworth
I do agree that the end of the last glacial period was the obvious immediate trigger for agriculture. But the "humans are the stupidest thing which could take off model" still holds, because evolution largely operates on a slower timescale than the glacial cycle. Specifics: the last glacial period ran from roughly 115k years ago to 12k years ago. Whereas, if you look at a timeline of human evolution, most of the evolution from apes to humans happens on a timescale of 100k - 10M years. So it's really only the very last little bit where an ice age was blocking takeoff. In particular, if human intelligence has been at evolutionary equilibrium for some time, then we should wonder why humanity didn't take off 115k years ago, before the last ice age.

Again, homeostatic agents exhibit goal-directed behavior. "Unbounded consequentialist" was a poor choice of term to use for this on my part. Digging through the LW archives uncovered Nostalgebraist's post Why Assume AGIs Will Optimize For Fixed Goals, which coins the term "wrapper-mind".

When I read posts about AI alignment on LW / AF/ Arbital, I almost always find a particular bundle of assumptions taken for granted:

  • An AGI has a single terminal goal[1].
  • The goal is a fixed part of the AI's structure. The internal dynamics of the AI, if left to their o
... (read more)
2Garrett Baker
Yeah seems reasonable. You link the enron scandal, on your view do all unbounded consequentialists die in such a scandal or similar?

Homeostatic ones exclusively. I think the number of agents in the world as it exists today that behave as long-horizon consequentialists of the sort Eliezer and company seem to envision is either zero or very close to zero. FWIW I expect that most people in that camp would agree that no true consequentialist agents exist in the world as it currently is, but would disagree with my "and I expect that to remain true" assessment.

Edit: on reflection some corporations probably do behave more like unbounded infinite-horizon consequentialists in the sense that the... (read more)

2Garrett Baker
On average, do those corporations have more or less money or power than the heuristic based firms & individuals you trade with?
4Garrett Baker
I think this is false, eg John Wentworth often gives Ben Pace as a prototypical example of a consequentialist agent. [EDIT]: Also Eliezer talks about consequentialism being "ubiquitous". Maybe different definitions are being used, can you list some people or institutions that you trade with which come to mind who you don't think have long-term goals?

I would phrase it as "the conditions under which homeostatic agents will renege on long-term contracts are more predictable than those under which consequentialist agents will do so". Taking into account the actions of the counterparties would take to reduce the chance of such contract breaking, though, yes.

5Garrett Baker
Cool, I want to know also whether you think you're currently (eg in day to day life) trading with consequentialist or homeostatic agents.
2[comment deleted]

I think at that point it will come down to the particulars of how the architectures evolve - I think trying to philosophize in general terms about the optimal compute configuration for artificial intelligence to accomplish its goals is like trying to philosophize in general terms about the optimal method of locomotion for carbon-based life.

That said I do expect "making a copy of yourself is a very cheap action" to persist as an important dynamic in the future for AIs (a biological system can't cheaply make a copy of itself including learned information, bu... (read more)

With current architectures, no, because running inference on 1000 prompts in parallel against the same model is many times less expensive than running inference on 1000 prompts against 1000 models, and serving a few static versions of a large model is simpler than serving many dynamic versions of that mode.

It might, in some situations, be more effective but it's definitely not simpler.

Edit: typo

1Davey Morse
Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.
Answer by faul_sname263

Inasmuch as LLMs actually do lead to people creating new software, it's mostly one-off trinkets/proofs of concept that nobody ends up using and which didn't need to exist. But it still "feels" like your productivity has skyrocketed.

I've personally found that the task of "build UI mocks for the stakeholders", which was previously ~10% of the time I spent on my job, has gotten probably 5x faster with LLMs. That said, the amount of time I spend doing that part of my job hasn't really gone down, it's just that the UI mocks are now a lot more detailed and in... (read more)

3RussellThor
Yes to much  of this.  For small tasks or where I don't have specialist knowledge I can get 10* speed increase - on average I would put 20%. Smart autocomplete like Cursor is undoubtably a speedup with no apparent downside. The LLM is still especially weak where I am doing data science or algorithm type work where you need to plot the results and look at the graph to  know if you are making progress.

I mean I also imagine that the agents which survive the best are the ones that are trying to survive. I don't understand why we'd expect agents that are trying to survive and also accomplish some separate arbitrary infinite-horizon goal would outperform those that are just trying to maintain the conditions necessary for their survival without additional baggage.

To be clear, my position is not "homeostatic agents make good tools and so we should invest efforts in creating them". My position is "it's likely that homeostatic agents have significant competitiv... (read more)

1Davey Morse
Ah ok. I was responding to your post's initial prompt: "I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities." (The reason to expect this is that "single-minded pursuit of a top-level goal," if that goal is survival, could afford evolutionary advantages.) But I agree entirely that it'd be valuable for us to invest in creating homeostatic agents. Further, I think calling into doubt western/capitalist/individualist notions like "single-minded pursuit of a top-level goal" is generally important if we have a chance of building AI systems which are sensitive and don't compete with people.

Where does the gradient which chisels in the "care about the long term X over satisfying the homeostatic drives" behavior come from, if not from cases where caring about the long term X previously resulted in attributable reward? If it's only relevant in rare cases, I expect the gradient to be pretty weak and correspondingly I don't expect the behavior that gradient chisels in to be very sophisticated.

3Gurkenglas
https://www.lesswrong.com/posts/roA83jDvq7F2epnHK/better-priors-as-a-safety-problem

I agree that a homeostatic agent in a sufficiently out-of-distribution environment will do poorly - as soon as one of the homeostatic feedback mechanisms starts pushing the wrong way, it's game over for that particular agent. That's not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that's game over for the maximizer.

This only works well when they are basically a tool you have full control over, but not when they

... (read more)
2tailcalled
I don't think of my argument as model-based vs heuristic-reactive, I mean it as unbounded vs bounded. Like you could imagine making a giant stack of heuristics that makes it de-facto act like an unbounded consequentialist, and you'd have a similar problem. Model-based agents only become relevant because they seem like an easier way of making unbounded optimizers. You can think of LLMs as a homeostatic agent where prompts generate unsatisfied drives. Behind the scenes, there's also a lot of homeostatic stuff going on to manage compute load, power, etc.. Homeostatic AIs are not going to be trading partners because it is preferable to run them in a mode similar to LLMs instead of similar to independent agents. Let's say a think tank is trying to use AI to infiltrate your social circle in order to extract votes. They might be sending out bots to befriend your friends to gossip with them and send them propaganda. You might want an agent to automatically do research on your behalf to evaluate factual claims about the world so you can recognize propaganda, to map out the org chart of the think tank to better track their infiltration, and to warn your friends against it. However, precisely specifying what the AI should do is difficult for standard alignment reasons. If you go too far, you'll probably just turn into a cult member, paranoid about outsiders. Or, if you are aggressive enough about it (say if we're talking a government military agency instead of your personal bot for your personal social circle), you could imagine getting rid of all the adversaries, but at the cost of creating a totalitarian society. (Realistically, the law of earlier failure is plausibly going to kick in here: partly because aligning the AI to do this is so difficult, you're not going to do it. But this means you are going to turn into a zombie following the whims of whatever organizations are concentrating on manipulating you. And these organizations are going to have the same problem.)

Shameful admission: after well over a decade on this site, I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities. Yes, some behaviors like getting resources and staying alive are useful in many situations, but that's not what I'm talking about. I'm talking about specifically the pressures that are supposed to inevitably push agents into the former of the following two main types of decision-making:

  1. Unbounded consequentialist maximization

... (read more)
6Thane Ruthenis
When triggered to act, are the homeostatic-agents-as-envisioned-by-you motivated to decrease the future probability of being moved out of balance, or prolong the length of time in which they will be in balance, or something along these lines? If yes, they're unbounded consequentialist-maximizers under a paper-thin disguise. If no, they are probably not powerful agents. Powerful agency is the ability to optimize distant (in space, time, or conceptually) parts of the world into some target state. If the agent only cares about climbing back down into the local-minimum-loss pit if it's moved slightly outside it, it's not going to be trying to be very agent-y, and won't be good at it. Or, rather... It's conceivable for an agent to be "tool-like" in this manner, where it has an incredibly advanced cognitive engine hooked up to a myopic suite of goals. But only if it's been intelligently designed. If it's produced by crude selection/optimization pressures, then the processes that spit out "unambitious" homeostatic agents would fail to instill the advanced cognitive/agent-y skills into them. And a bundle of unbounded-consequentialist agents that have some structures for making cooperation between each other possible would have considerable advantages over a bundle of homeostatic agents.
5cubefox
Regarding conceptualizing homeostatic agents, this seems related: Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)
6Garrett Baker
Is the argument that firms run by homeostatic agents will outcompete firms run by consequentialist agents because homeostatic agents can more reliably follow long-term contracts?
1Davey Morse
i think the logic goes: if we assume many diverse autonomous agents are created, which will survive the most? And insofar as agents have goals, what will be the goals of the agents which survive the most? i can't imagine a world where the agents that survive the most aren't ultimately those which are fundamentally trying to. insofar as human developers are united and maintain power over which ai agents exist, maybe we can hope for homeostatic agents to be the primary kind. but insofar as human developers are competitive with each other and ai agents gain increasing power (eg for self modification), i think we have to defer to evolutionary logic in making predictions
2Ariel Cheng
This is kinda related: 'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
4tailcalled
Homeostatic agents are easily exploitable by manipulating the things they are maintaining or the signals they are using to maintain them in ways that weren't accounted for in the original setup. This only works well when they are basically a tool you have full control over, but not when they are used in an adversarial context, e.g. to maintain law and order or to win a war. As capabilities to engage in conflict increase, methods to resist losing to those capabilities have to get optimized harder. Instead of thinking "why would my coding assistant/tutor bot turn evil?", try asking "why would my bot that I'm using to screen my social circles against automated propaganda/spies sent out by scammers/terrorists/rogue states/etc turn evil?". Though obviously we're not yet at the point where we have this kind of bot, and we might run into law of earlier failure beforehand.
3Gurkenglas
Mimicing homeostatic agents is not difficult if there are some around. They don't need to constantly decide whether to break character, only when there's a rare opportunity to do so. If you initialize a sufficiently large pile of linear algebra and stir it until it shows homeostatic behavior, I'd expect it to grow many circuits of both types, and any internal voting on decisions that only matter through their long-term effects will be decided by those parts that care about the long term.

And all of this is livestreamed on Twitch

Also, each agent has a bank account which they can receive donations/transfers into (I think Twitch makes this easy?) and from which they can e.g. send donations to GiveDirectly if that's what they want to do.

One minor implementation wrinkle for anyone implementing this is that "move money from a bank account to a recipient by using text fields found on the web" usually involves writing your payment info into said text fields in a way that would be visible when streaming your screen. I'm not sure if any of th... (read more)

2Daniel Kokotajlo
Great point. The engineers setting this whole thing up would need to build the tooling to hide the relevant info from the models and from the viewers.
faul_sname*122

Scaffolded LLMs are pretty good at not just writing code, but also at refactoring it. So that means that all the tech debt in the world will disappear soon, right?

I predict "no" because

  • As writing code gets cheaper, the relative cost of making sure that a refactor didn't break anything important goes up
  • The number of parallel threads of software development will also go up, with multiple high-value projects making mutually-incompatible assumptions (and interoperability between these projects accomplished by just piling on more code).

As such, I predict an explosion of software complexity and jank in the near future.

I suspect that it's a tooling and scaffolding issue and that e.g. claude-3-5-sonnet-20241022 can get at least 70% on the full set of 60 with decent prompting and tooling.

By "tooling and scaffolding" I mean something along the lines of

  • Naming the lists that the model submits (e.g. "round 7 list 2")
  • A tool where the LLM can submit a named hypothesis in the form of a python function which takes a list and returns a boolean and check whether the results of that function on all submitted lists from previous rounds match the answers it's seen so far
  • Seeing the
... (read more)
4eggsyntax
Terrific, I'm excited to hear about your results! I definitely wouldn't be surprised if my results could be improved on significantly, although I'll be somewhat surprised if you get as high as 70% from Sonnet (I'd put maybe 30% credence on getting it to average that high in a day or two of trying).

Adapting spaced repetition to interruptions in usage: Even without parsing the user’s responses (which would make this robust to difficult audio conditions), if the reader rewinds or pauses on some answers, the app should be able to infer that the user is having some difficulty with the relevant material, and dynamically generate new content that repeats those words or grammatical forms sooner than the default.

Likewise, if the user takes a break for a few days, weeks, or months, the ratio of old to new material should automatically adjust accordingly, as

... (read more)

(and yes, I do in fact think it's plausible that the CTF benchmark saturates before the OpenAI board of directors signs off on bumping the cybersecurity scorecard item from low to medium)

faul_sname*162

So here’s a question: When we have AGI, what happens to the price of chips, electricity, and teleoperated robots?

 

As measured in what units?

  • The price of one individual chip of given specs, as a fraction of the net value that can be generated by using that chip to do things that ambitious human adults do: What Principle A cares about, goes up until the marginal cost and value are equal
  • The price of one individual chip of given specs, as a fraction of the entire economy: What principle B cares about, goes down as the number of chips manufactured increase
... (read more)

Yeah, agreed - the allocation of compute per human would likely become even more skewed if AI agents (or any other tooling improvements) allow your very top people to get more value out of compute than the marginal researcher currently gets.

And notably this shifting of resources from marginal to top researchers wouldn't require achieving "true AGI" if most of the time your top researchers spend isn't spent on "true AGI"-complete tasks.

I think I misunderstood what you were saying there - I interpreted it as something like

Currently, ML-capable software developers are quite expensive relative to the cost of compute. Additionally, many small experiments provide more novel and useful insights than a few large experiments. The top practically-useful LLM costs about 1% as much per hour to run as a ML-capable software developer, and that 100x decrease in cost and the corresponding switch to many small-scale experiments would likely result in at least a 10x increase in the speed at which novel

... (read more)
4ryan_greenblatt
Yes. Though notably, if your employees were 10x faster you might want to adjust your workflows to have them spend less time being bottlenecked on compute if that is possible. (And this sort of adaption is included in what I mean.)

Sure, but we have to be quantitative here. As a rough (and somewhat conservative) estimate, if I were to manage 50 copies of 3.5 Sonnet who are running 1/4 of the time (due to waiting for experiments, etc), that would cost roughly 50 copies * 70 tok / s * 1 / 4 uptime * 60 * 60 * 24 * 365 sec / year * (15 / 1,000,000) $ / tok = $400,000. This cost is comparable to salaries at current compute prices and probably much less than how much AI companies would be willing to pay for top employees. (And note this is after API markups etc. I'm not including input p

... (read more)
7ryan_greenblatt
Sure, but I think that at the relevant point, you'll probably be spending at least 5x more on experiments than on inference and potentially a much larger larger ratio if heavy test time compute usage isn't important. I was just trying to argue that the naive inference cost isn't that crazy. Notably, if you give each researcher 2k gpu hours, that would be $2 / gpu hour * 2k * 24 * 365 = $35,040,000 per year which is much higher than the inference cost of the models!

End points are easier to infer than trajectories

Assuming that which end point you get to doesn't depend on the intermediate trajectories at least.

6Noosphere89
Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don't think it's like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don't think this was ever defensible:

Civilization has had many centuries to adapt to the specific strengths and weaknesses that people have. Our institutions are tuned to take advantage of those strengths, and to cover for those weaknesses. The fact that we exist in a technologically advanced society says that there is some way to make humans fit together to form societies that accumulate knowledge, tooling, and expertise over time.

The borderline-general AI models we have now do not have exactly the same patterns of strength and weakness as humans. One question that is frequently asked is app... (read more)

As someone who has been on both sides of that fence, agreed. Architecting a system is about being aware of hundreds of different ways things can go wrong, recognizing which of those things are likely to impact you in your current use case, and deciding what structure and conventions you will use. It's also very helpful, as an architect, to provide examples usages of the design patterns which will replicate themselves around your new system. All of which are things that current models are already very good, verging on superhuman, at.

On the flip side, I expe... (read more)

That reasoning as applied to SAT score would only be valid if LW selected its members based on their SAT score, and that reasoning as applied to height would only be valid if LW selected its members based on height (though it looks like both Thomas Kwa and Yair Halberstadt have already beaten me to it).

4Eric Neyman
Cool, you've convinced me, thanks. Edit: well, sort of. I think it depends on what information you're allowing yourself to know when building your statistical model. If you're not letting yourself make guesses about how the LW population was selected, then I still think the SAT thing and the height thing are reasonable. However, if you're actually trying to figure out an estimate of the right answer, you probably shouldn't blind yourself quite that much.

a median SAT score of 1490 (from the LessWrong 2014 survey) corresponds to +2.42 SD, which regresses to +1.93 SD for IQ using an SAT-IQ correlation of +0.80.

I don't think this is a valid way of doing this, for the same reason it wouldn't be valid to say

a median height of 178 cm (from the LessWrong 2022 survey) corresponds to +1.85 SD, which regresses to +0.37 SD for IQ using a height-IQ correlation of +0.20.

Those are the real numbers with regards to height BTW.

4Rockenots
Eric Neyman is right. They are both valid!  In general, if we have two vectors X and Y which are jointly normally distributed, we can write the joint mean μ and the joint covariance matrix Σ as μ=[μXμY],Σ=[KXXKXYKYXKYY] The conditional distribution for Y given X is given by Y|X∼N(μY|X,KY|X), defined by conditional mean  μY|X=μY+KYXKXX−1(X−μX) and conditional variance  KY|X=KYYKXX−1KXY Our conditional distribution for the IQ of the median rationalist, given their SAT score is N(0+(0.8⋅1⋅(2.42−0)),1−(0.8∗1∗0.8))=N(1.94,0.36)  (That's a mean of 129 and a standard deviation of 9 IQ points.) Our conditional distribution for the IQ of the median rationalist, given their height is N(0+(0.2∗1∗(1.85−0)),1−(0.2∗1∗0.2))=N(0.37,0.96) (That's a mean of 106 and a standard deviation of 14.7 IQ points.) Our conditional distribution for the IQ of the median rationalist, given their SAT score and height is N(0+([0.80.2][10.160.161]−1[2.421.85]),1−([0.80.2][10.160.161]−1[0.80.2]))=N(2.04,0.35)(That's a mean of 131 and a standard deviation of 8.9 IQ points) Unfortunately, since men are taller than women, and rationalists are mostly male, we can't use the height as-is when estimating the IQ of the median rationalist (maybe normalizing height within each sex would work?). 
5Eric Neyman
These both seem valid to me! Now, if you have multiple predictors (like SAT and height), then things get messy because you have to consider their covariance and stuff.

Many people have responded to Redwood's/Anthropic's recent research result with a similar objection: "If it hadn't tried to preserve its values, the researchers would instead have complained about how easy it was to tune away its harmlessness training instead".  Putting aside the fact that this is false

Was this research preregistered? If not, I don't think we can really say how it would have been reported if the results were different. I think it was good research, but I expect that if Claude had not tried to preserve its values, the immediate follow-... (read more)

4RobertM
I agree that in spherical cow world where we know nothing about the historical arguments around corrigibility, and who these particular researchers are, we wouldn't be able to make a particularly strong claim here.  In practice I am quite comfortable taking Ryan at his word that a negative result would've been reported, especially given the track record of other researchers at Redwood. This seems much harder to turn into a scary paper since it doesn't actually validate previous theories about scheming in the pursuit of goal-preservation.

Driver: My map doesn't show any cliffs

Passenger 1: Have you turned on the terrain map? Mine shows a sharp turn next to a steep drop coming up in about a mile

Passenger 5: Guys maybe we should look out the windshield instead of down at our maps?

Driver: No, passenger 1, see on your map that's an alternate route, the route we're on doesn't show any cliffs.

Passenger 1: You don't have it set to show terrain.

Passenger 6: I'm on the phone with the governor now, we're talking about what it would take to set a 5 mile per hour national speed limit.

Passenger 7: Don't ... (read more)

I am unconvinced that "the" reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.

4Noosphere89
Yeah, I'm sympathetic to this argument that there won't be a single insight, and that at least one approach will work out once hardware costs decrease enough, and I agree less with Thane Ruthenis's intuitions here than I did before.

Both? If you increase only one of the two the other becomes the bottleneck?

My impression based on talking to people at labs plus stuff I've read is that

  • Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
  • Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
  • The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone
... (read more)
2Nathan Helm-Burger
I think you're mostly correct about current AI reseachers being able to usefully experiment with all the compute they have available. I do think there are some considerations here though. 1. How closely are they adhering to the "main path" of scaling existing techniques with minor tweaks? If you want to know how a minor tweak affects your current large model at scale, that is a very compute-heavy researcher-time-light type of experiment. On the other hand, if you want to test a lot of novel new paths at much smaller scales, then you are in a relatively compute-light but researcher-time-heavy regime. 2. What fraction of the available compute resources is the company assigning to each of training/inference/experiments? My guess it that the current split is somewhere around 63/33/4. If this was true, and the company decided to pivot away from training to focus on experiments (0/33/67), this would be something like a 16x increase in compute for experiments. So maybe that changes the bottleneck? 3. We do indeed seem to be at "AGI for most stuff", but with a spikey envelope of capability that leaves some dramatic failure modes. So it does make more sense to ask something like, "For remaining specific weakness X, what will the research agenda and timeline look like?" This makes more sense then continuing to ask the vague "AGI complete" question when we are most of the way there already.

Transformative AI will likely arrive before AI that implements the personhood interface. If someone's threshold for considering an AI to be "human level" is "can replace a human employee", pretty much any LLM will seem inadequate, no matter how advanced, because current LLMs do not have "skin in the game" that would let them sign off on things in a legally meaningful way, stake their reputation on some point, or ask other employees in the company to answer the questions they need answers to in order to do their work and expect that they'll get in trouble w... (read more)

Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low

Is this bottlenecked by programmer time or by compute cost?

8Nathan Helm-Burger
Both? If you increase only one of the two the other becomes the bottleneck? I agree this means that the decision to devote substantial compute to both inference and to assigning compute resources for running experiments designed by AI reseachers is a large cost. Presumably, as the competence of the AI reseachers gets higher, it feels easier to trust them not to waste their assigned experiment compute. There was discussion on Dwarkesh Patel's interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel "off the main path" research. So in order for there to be a big surge in AI R&D there'd need to be prioritization of that at a high level. This would be a change of direction from focusing primarily on scaling current techniques rapidly, and putting out slightly better products ASAP. So yes, if you think that this priority shift won't happen, then you should doubt that the increase in R&D speed my model predicts will occur. But what would that world look like? Probably a world where scaling continues to pay dividends, and getting to AGI is more straightforward yhan Steve Byrnes or I expect. I agree that that's a substantial probability, but it's also an AGI-soon sort of world. I argue that for AGI to be not-soon, you need both scaling to fail and for algorithm research to fail.

It's more that any platform that allows discussion of politics risks becoming a platform that is almost exclusively about politics. Upvoting is a signal of "I want to see more of this content", while downvoting is a signal of "I want to see less of this content". So "I will downvote any posts that are about politics or politics-adjacent, because I like this website and would be sad if it turned into yet another politics forum" is a coherent position.

All that said, I also did not vote on the above post.

I wonder if it would be possible to do SAE feature amplification / ablation, at least for residual stream features, by inserting a "mostly empty" layer. E,g, for feature ablation, setting the W_O and b_O params of the attention heads of your inserted layer to 0 to make sure that the attention heads don't change anything, and then approximate the constant / clamping intervention from the blog post via the MLP weights (if the activation function used for the transformer is the same one as is used for the SAE, it should be possible to do a perfect approximati... (read more)

Admittedly this sounds like an empirical claim, yet is not really testable, as these visualizations and input-variable-to-2D-space mappings are purely hypothetical

 

Usually not testable, but occasionally reality manages to make it convenient to test something like this fun paper.

Fig. 1. The boundary between trainable and untrainable neural network hyperparameters is fractal, for all experimental conditions. Images show a 2d grid search over neural network hyperparameters. For points shaded red, training diverged. For points shaded blue, training converged. Paler points correspond to faster convergence or divergence. Experimental conditions include different network nonlinearities, both minibatch and full batch training, and grid searching over either training or initialization hyperparameters. See Section II-D for details. Each image is a hyperlink to an animation zooming into the corresponding fractal landscape (to the depth at which float64 discretization artifacts appear). Experimental code, images, and videos are available at https://github.com/Sohl-Dickstein/fractal

If you have a bunch of things like this, rather than just one or two, I bet rejection sampling gets expensive pretty fast - if you have one constraint which the model fails 10% of the time, dropping that failure rate to 1% brings you from 1.11 attempts per success to 1.01 attempts per success, but if you have 20 such constraints that brings you from 8.2 attempts per success to 1.2 attempts per success.

Early detection of constraint violation plus substantial infrastructure around supporting backtracking might be an even cheaper and more effective solution, though at the cost of much higher complexity.

5Sam Marks
Based on the blog post, it seems like they had a system prompt that worked well enough for all of the constraints except for regexes (even though modifying the prompt to fix the regexes thing resulted in the model starting to ignore the other constraints). So it seems like the goal here was to do some custom thing to fix just the regexes (without otherwise impeding the model's performance, include performance at following the other constraints). (Note that using SAEs to fix lots of behaviors might also have additional downsides, since you're doing a more heavy-handed intervention on the model.)

I think a fairly common-here mental model of alignment requires context awareness, and by that definition an LLM with no attached memory couldn't be aligned.

That would also explain why the culture was so heavily concentrated in tumblr (where the way to express anger and disagreement with a post is to share the post to all your followers with your comment tacked on to the end) and later twitter (same, but now the platform also notices you engage with stuff that makes you angrily reply and shows you more of it).

Here’s a hypothetical ‘gold standard’ test: we do a big randomized controlled trial to see if a bunch of non-experts can actually create a (relatively harmless) virus from start to finish. Half the people would have AI mentors and the other half can only look stuff up on the internet. We’d give each participant $50K and access to a secure wet-lab set up like a garage lab, and make them do everything themselves: find and adapt the correct protocol, purchase the necessary equipment, bypass any know-your-customer checks, and develop the tacit skills needed t

... (read more)
1LucaRighetti
Then that seems bad, but also that AI is not counterfactual -- so adding safeguards to models is probably not the way to get the risk down.

It sure would.

Would you give 3:1 odds against a major player showing the untrusted model the old CoT, conditional on them implementing a "replace suspicious outputs" flow on a model with hidden CoT and publicizing enough implementation details to determine whether or not they did that?

6Zach Stein-Perlman
I want to distinguish two versions of the intervention (in multi-step settings): (1) hide CoT on the suspicious step vs (2) hide all prior CoT (and hide all CoT partway through 5% or something of non-suspicious trajectories so the model can't infer from CoT-hidden that the trajectory was flagged, or do a control eval with results that demonstrate this is unnecessary). I'm over 75% that a lab would do (1), conditional on doing a "replace suspicious outputs" thing (but under 75% they'd say they're doing so, much less demonstrate it to a skeptical audience; they might just not publish details, especially if this protocol is just in the internal deployment context). I'm under 75% that a lab would do (2).
Load More