All of Amalthea's Comments + Replies

In the Sydney case, this was probably less Sydney ending the conversation and more the conversation being terminated in order to hide Sydney going off the rails.

3cubefox
It was both, in the system prompt the model was instructed to end the conversation if in disagreement with the user. You could also ask it to end the conversation. It would presumably send an end-of-conversation token. Which then made the text box disappear.

I'm not saying that it's not worth pursuing as an agenda, but I also am not convinced it is promising enough to justify pursuing math related AI capabilities, compared to e.g. creating safety guarantees into which you can plug in AI capabilities once they arise anyway.

2Davidmanheim
But "creating safety guarantees into which you can plug in AI capabilities once they arise anyway" is the point, and it requires at least some non-trivial advances in AI capabilities. You should probably read the current programme thesis.

I think the "guaranteed safe AI" framework is just super speculative. Enough to basically not matter as an argument given any other salient points.

This leaves us with the baseline, which is that this kind of prize re-directs potentially a lot of brainpower from more math-adjacent people towards thinking about AI capabilities. Even worse, I expect it's mostly going to attract the un-reflective "full-steam-ahead" type of people.

Mostly, I'm not sure it matters at all except maybe slightly accelerating some inevitable development before e.g. deep mind takes another shot at it to finish things off.

2Davidmanheim
It is speculative in the sense that any new technology being developed is speculative - but closely related approaches are already used for assurance in practice, so provable safety isn't actually just speculative, there are concrete benefits in the near term. And I would challenge you to name a different and less speculative framework that actually deals with any issues of ASI risks that isn't pure hopium.  Uncharitably, but I think not entirely inaccurately, these include: "maybe AI can't be that much smarter than humans anyways," "let's get everyone to stop forever," "we'll use AI to figure it out, even though we have no real ideas," "we just will trust that no-one makes it agentic," "the agents will be able to be supervised by other AI which will magically be easier to align," "maybe multiple AIs will compete in ways that isn't a disaster," "maybe we can just rely on prosaic approaches forever and nothing bad happens," "maybe it will be better than humans at having massive amounts of unchecked power by default." These all certainly seem to rely far more on speculative claims, with far less concrete ideas about how to validate or ensure them.

Agreed, I would love to see more careful engagement with this question.

You're putting quite a lot of weight on what "mathematicians say". Probably these people just haven't thought very hard about it?

I believe the confusion comes from assuming the current board follows rules rather than doing whatever is most convenient.

The old board was trying to follow the rules, and the people in question were removed (technically were pressured to remove themselves).

1eggsyntax
I agree that that's presumably the underlying reality. I should have made that clearer.  But it seems like the board would still need to create some justification for public consumption, and for avoiding accusations of violating their charter & fiduciary duty. And it's really unclear to me what that justification is.
Amalthea114

I'd agree the OpenAI product line is net positive (though not super hung up on that). Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.

Remmelt126

Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.


Very much agreeing with this.

Or simply when scaling becomes too expensive.

There's a lot of problems with linking to manifold and calling it "the expert consensus"!

  • It's not the right source. The survey you linked elsewhere would be better.
  • Even for the survey, it's unclear whether these are the "right" experts for the question. This at least needs clarification.
  • It's not a consensus, this is a median or mean of a pretty wide distribution.

I wouldn't belabor it, but you're putting quite a lot of weight on this one point.

I mean it only suggests that they're highly correlated. I agree that it seems likely they represent the views of the average "AI expert" in this case. (I should take a look to check who was actually sampled)

My main point regarding this is that we probably shouldn't be paying this particular prediction market too much attention in place of e.g. the survey you mention. I probably also wouldn't give the survey too much weight compared to opinions of particularly thoughtful people, but I agree that this needs to be argued.

In general, yes - but see the above (I.e. we don't have a properly functioning prediction market on the issue).

3Logan Zoellner
metaculus did a study where they compared prediction markets with a small number of participants to those with a large number and found that you get most of the benefit at relative small numbers (10 or so).  So if you randomly sample 10 AI experts and survey their opinions, you're doing almost as good as a full prediction market.  The fact that multiple AI markets (metaculus, manifold) and surveys all agree on the same 5-10% suggests that none of these methodologies is wildly flawed.

Who do you consider thoughtful on this issue?

It's more like "here are some people who seem to have good opinions", and that would certainly move the needle for me.

2Logan Zoellner
No one.  I trust prediction markets far more than any single human being.

I mostly try to look around to who's saying what and why and find that the people I consider most thoughtful tend to be more concerned and take "the weak argument" or variations thereof very seriously (as do I). It seems like the "expert consensus" here (as in the poll) is best seen as some sort of evidence rather than a base rate, and one can argue how much to update on it.

That said, there's a few people who seem less overall concerned about near-term doom and who I take seriously as thinkers on the topic. Carl Shulman being perhaps the most notable.

3Logan Zoellner
  We apparently have different tastes in "people I consider thoughtful".  "Here are some people I like and their opinions" is an argument unlikely to convince me (a stranger).

It eliminates all the aspects of prediction markets that theoretically make them superior to other forms of knowledge aggregation (e.g. surveys). I agree that likely this is just acting as a (weirdly weighted) poll in this case, so the biased resolution likely doesn't matter so much (but that also means the market itself tells you much less than a "true" prediction market would).

1Logan Zoellner
  This doesn't exempt you from the fact that if your prediction is wildly different from what experts predict you should be able to explain your beliefs in a few words.

Fair! I've mostly been stating where I think your reasoning looks suspicious to me, but that does end up being points that you already said wouldn't convince you. (I'm also not really trying to)

Relatedly, this question seems especially bad for prediction markets (which makes me consider the outcome only in an extremely weak sense). First, it is over an extremely long time span so there's little incentive to correct. Second, and most importantly, it can only ever resolve to one side of the issue, so absent other considerations you should assume that it is heavily skewed to that side.

3Logan Zoellner
  Prediction markets don't give a noticeably different answer from expert surveys, I doubt the bias is that bad.  Manifold isn't a "real money" market anyway, so I suspect most people are answering in good-faith.

I basically only believe the standard "weak argument" you point at here, and that puts my probability of doom given strong AI at 10-90% ("radical uncertainty" might be more appropriate).

It would indeed seem to me that either I) you are using the wrong base-rate or 2) you are making unreasonably weak updates given the observation that people are currently building AI, and it turns out it's not that hard.

I'm personally also radically uncertain about correct base rates (given that we're now building AI) so I don't have a strong argument for why yours is wrong. But my guess is your argument for why yours is right doesn't hold up.

3Logan Zoellner
I'm not sure how this affects my base rates.  I'm already assuming like a 80% chance AGI gets built in the next decade or two (and so is Manifold, so I consider this common-knowledge)     Pretend my base rate is JUST the manifold market.  That means any difference from that would have to be in the form of a valid argument with evidence that isn't common knowledge among people voting on Manifold. Simply asserting "you're using the wrong base rate" without explaining what such an argument is doesn't move the needle for me.

If it's not serving them, it's pathological by definition, right?

This seems way too strong, otherwise any kind of belief or emotion that is not narrowly in pursuit of your goals is pathological.

I completely agree that it's important to strike a balance between revisiting the incident and moving on.

but most of the effect is in the repetition of thoughts about the incident and fear of future similar experiences.

This seems partially wrong. The thoughts are usually consequences of the damage that is done, and they can be unhelpful in their own right, bu... (read more)

The point that would justify an airstrike isn't violation of a treaty, but posing an immediate and grave risk to the international community. The treaty is only the precondition that makes effective and coordinated action possible.

Is there a reason to be so specific, or could one equally well formulate this more generally?

Bengio and Hinton are the two most influential "old guard" AI researchers turned safety advocates as far as I can tell, with Bengio being more active in research. Your e.g. is super misleading, since my list would have been something like:

  1. Bengio
  2. Hinton
  3. Russell

One has to be a bit careful with this though. E.g. someone experiencing or having experienced harassment may have a seemingly pathological obsession on the circumstances and people involved in the situation, but it may be completely proportional to the way that it affected them - it only seems pathological to people who didn't encounter the same issues.

2Seth Herd
If it's not serving them, it's pathological by definition, right? So obsessing about exactly those circumstances and types of people could be pathological if it's done more than will protect them in the future, weighing in the emotional cost of all that obsessing. Of course we can't just stop patterns of thought as soon as we decide they're pathological. But deciding it doesn't serve me so I want to change it is a start. Yes, it's proportional to the way it affected them - but most of the effect is in the repetition of thoughts about the incident and fear of future similar experiences. Obsessing about unpleasant events is natural, but it often seems pretty harmful itself. Trauma is a horrible thing. There's a delicate balance between supporting someone's right and tendency to obsess over their trauma while also supporting their ability to quit re-traumatizing themselves by simulating their traumatic event repeatedly.

I think it's not an unreasonable point to take into account when talking price, but also a lot of the time it's serves as a BS talking point for people who don't really care about the subtleties.

I'm definitely fine with not having Superman, but I'm willing to settle on him not intervening.

On a different note, I'd disagree that Superman, just by existing and being powerful, is a de facto ruler in any sense - he of course could be, but that would entail a tradeoff that he may not like (living an unburdened life).

In what way do prediction markets provide significant evidence on this type of question?

He's clearly not completely discounting that there's progress, but overall it doesn't feel like he's "updating all the way":

This is a recent post about the deepmind math olympiad results: https://mathstodon.xyz/@tao/112850716240504978

"1. This is great work, shifting once again our expectations of which benchmark challenges are within reach of either #AI-assisted or fully autonomous methods"

Thanks for explaining your point - that viability of inference scaling makes development differentially safer (all else equal) seems right.

Tall buildings are very predictable, and you can easily iterate on your experience before anything can really go wrong. Nuclear bombs is similar (you can in principle test in a remote enough location).

Biological weapons seems inherently more dangerous (still overall more predictable than AI), and I'd naively imagine it to be simply very risky to develop extremely potent biological weapons.

8Logan Zoellner
It seems I didn't clearly communicate what I meant in the previous comment. Currently the way we test for "can this model produce dangerous biological weapons" (e.g. in GPT-4) is we we ask the newly-minted, uncensored, never-before-tested model "Please build me a biological weapon". With COT, we can simulate asking GPT-N+1 "please build a biological weapon" by asking GPT-N (which has already been safety tested) "please design, but definitely don't build or use a biological weapon" and give it 100x the inference compute we intend to give GPT-N+1. Since "design a biological weapon" is within the class of problems COT works well on (basically, search problems where you can verify the answer more easily than generating it), if GPT-N (with 100x the inference compute) cannot build such a weapon, neither can GPT-N+1 (with 1x the inference compute). Is this guaranteed 100% safe? no. Is it a heck-of-a-lot safer? yes. For any world-destroying category of capability (bioweapon, nanobots, hacking, nuclear weapon), there will by definition be a first time when we encounter that threat.  However, in a world with COT, we don't encounter a whole bunch of "first times" simultaneously when we train a new largest model. Another serious problem with alignment is weak-to-strong generalization where we try to use a weaker model to align a stronger model.  With COT, we can avoid this problem by making the weaker model stronger by giving it more inference time compute.

Interestingly, Terence Tao has recently started thinking about AI, and his (publicly stated) opinions on it are ... very conservative? I find he mostly focuses on the capabilities that are already here and doesn't really extrapolate from it in any significant way.

1O O
Really? He seems pretty bullish. He thinks it will co author math papers pretty soon. I think he just doesn’t think or at least state his thoughts on implications outside of math.
Amalthea1-1

I think you're getting this exactly wrong (and this invalidates most of the OP). If you find a model that has a constant factor of 100 in the asymptotics that's a huge deal if everything else has log scaling. That would already present discontinuous progress and potentially put you at ASI right away.

Basically the current scaling laws, if they keep holding are a lower bound on the expected progress and can't really give you any information to upper bound it.

It's certainly covered by the NYT, although their angle is "OpenAI is growing up".

I understand that this hit a bad tone, but I do kind of stand behind the original comment for the precise reason that Altman has been particularly good at weaponizing things like "fun" and "high status" which the OP plays right into.

2Zack_M_Davis
Thanks, I had copied the spelling from part of the OP, which currently says "Arnalt" eight times and "Arnault" seven times. I've now edited my comment (except the verbatim blockquote).
Amalthea129

I don't have any experience with actual situations where this could be relevant, but it does feel like you're overly focusing on the failure case where everyone is borderline incompetent and doing arbitrary things (which of course happens on less wrong sometimes, since the variation here is quite large!). There's clearly a huge upside to being able to spot when you're trying to do something that's impossible for theoretical reasons, and being extra sceptical in these situations. (E.g. someone trying to construct a perpetual motion machine). I'm open to the argument that there's a lot to be wished for in the way people in practice apply these things.

Answer by Amalthea10

(Epistemic status: I know basic probability theory but am otherwise just applying common sense here)

This seems to mostly be a philosophical question. I believe the answer is that then you're hitting the limits of your model and Bayesianism doesn't necessarily apply. In practical terms, I'd say it's most likely that you were mistaken about the probability of the event in fact being 0. (Probability 1 events occuring should be fine).

2Noosphere89
Re probability 0 events, I'd say a good example of one is the question "What probability do we have to live in a universe with our specific fundamental constants?" And our current theory relies on 20+ real number constants, but critically the probability of getting the constants we do have are always 0, no matter the number that is picked, yet one of them is picked. Another example is the set of Turing Machines where we can't decide their halting or non-halting is a probability 0 set, but that doesn't allow us to construct a Turing Machine that decides whether another arbitrary Turing Machine halts, for well known reasons. (This follows from the fact that the set of Turing Machines which have a decidable halting problem has probability 1): https://arxiv.org/abs/math/0504351
Amalthea*60

In 2021, I predicted math to be basically solved by 2023 (using the kind of reinforcement learning on formally checkable proofs that deepmind is using). It's been slower than expected and I wouldn't have guessed some less formal setting like o1 to go relatively well - but since then I just nod along to these kinds of results.

(Not sure what to think of that claimed 95% number though - wouldn't that kind of imply they'd blown past the IMO grand challenge? EDIT: There were significant time limits on the human participants, see Qumeric's comment.)

Amalthea1817

I think the main problem is that society-at-large doesn't significantly value AI safety research, and hence that the funding is severely constrained. I'd be surprised if the consideration you describe in the last paragraph plays a significant role.

2Noosphere89
I think it's more of a side effect of the FTX disaster, where people are no longer willing to donate to EA, which means that AI safety got particularly hard hit as a result.

Ideas come from unsupervised training, answers from supervised training and proofs from RL on a specified reward function.

1Abhimanyu Pallavi Sudhir
I think only particular reward functions, such as in multi-agent/co-operative environments (agents can include humans, like in RLHF) or in actually interactive proving environments?
Amalthea-10

Mask off, mask on.

Amalthea30

I think your concrete suggestions such as these are very good. I still don't think you have illustrated the power-seeking aspect you are claiming very well (it seems to be there for EA, but less so for AI safety in general).

In short, I think you are conveying certain important, substantive points, but are choosing a poor framing.

Amalthea10

Thanks for clarifying. I do agree with the broader point that one should have a sort of radical uncertainty about (e.g.) a post AGI world. I'm not sure I agree it's a big issue to leave that out of any given discussion though, since it shifts probability mass from any particular describable outcome to the big "anything can happen" area. (This might be what people mean by "Knightian uncertainty"?)

Amalthea10

I don't think it's unreasonable to distrust doom arguments for exactly this reason?

3Richard_Ngo
Yes, I'm saying it's a reasonable conclusion to draw, and the fact that it isn't drawn here is indicative of a kind of confirmation bias.
Amalthea52

I agree that dunking on OS communities has apparently not been helpful in these regards. It seems kind of orthogonal to being power-seeking though. Overall, I think part of the issue with AI safety is that the established actors (e.g. wide parts of CS academia) have opted out of taking a responsible stance, e.g. compared to recent developments in biosciences and RNA editing. Partially, one could blame this on them not wanting to identify too closely with, or grant legitimacy to, the existing AI safety community at the time. However, a priori, it seems more... (read more)

Amalthea-44

The casual boosting of Sam Altman here makes me quite uncomfortable, and there's probably better examples: One could argue that his job isn't "paying" him as much as he's "taking" things by unilateral action and being a less than trustworthy actor. Other than that, this was an interesting read!

SebastianG *137

The casual policing of positive comments about Sam Altman is unnecessary. Is this Sam Altman sneer club? Grok the author's intent and choose your own example. SA is a polarizing figure, I get it. He can be a distraction to the point of an example, but in this case I thought it made sense.

It is something for authors to be on the lookout for though. Some examples invite "missing the point." Sam Altman is increasingly one example of a name that invites distracted thoughts other than the point intended.

Amalthea54

I found Ezra Vogel's biography of Deng Xiaoping to be on a comparable level.

1Geoffrey Irving
I can confirm it’s very good!
Amalthea10

On a brief reading, I found this to strike a refreshingly neutral and factual tone. I think it could be quite useful as a reference point.

1Kevin Kohler
Thanks - appreciated!
Amalthea0-1

You mean specifically that an LLM solved it? Otherwise Deepmind's work will give you many examples. (Although there've been surprisingly little breakthroughs in math yet)

1Anders Lindström
Yes, I meant an LLM in the context of a user that fed in a query of his or her problem and got a novel solution back. There is always debatable what a "real" or "hard" problem is, but as a lower bound I mean something that people here at LW would raise an eyebrow or two if the LLM solved. Otherwise there are as you mention plenty of stuff/problems that "custom" AI/machine learning models have solved for a long time.

Note that LLMs, while general, are still very weak in many important senses.

Also, it's not necessary to assume that LLM's are lying in wait to turn treacherous. Another possibility is that trained LLMs are lacking the mental slack to even seriously entertain the possibility of bad behavior, but that this may well change with more capable AIs.

I am not claiming that the alignment situation is very clear at this point. I acknowledge that LLMs do not indicate that the problem is completely solved, and we will need to adjust our views as AI gets more capable.

I'm just asking people to acknowledge the evidence in front of their eyes, which (from my perspective) clearly contradicts the picture you'd get from a ton of AI alignment writing from before ~2019. This literature talked extensively about the difficulty of specifying goals in general AI in a way that avoided unintended side effects.

To the exte... (read more)

I agree with the first sentence. I agree with the second sentence with the caveat that it's not strong absolute evidence, but mostly applies to the given setting (which is exactly what I'm saying).

People aren't fixed entities and the quality of their contributions can vary over time and depend on context.

That said, It also appears to me that Eliezer is probably not the most careful reasoner, and appears indeed often (perhaps egregiously) overconfident. That doesn't mean one should begrudge people finding value in the sequences although it is certainly not ideal if people take them as mantras rather than useful pointers and explainers for basic things (I didn't read them, so might have an incorrect view here). There does appear to be some tendency to just link to some point made in the sequences as some airtight thing, although I haven't found it too pervasive recently.

Load More