All of Siebe's Comments + Replies

Siebe10

More thoughts:

I thought that AlphaZero was a counterpoint, but apparently it's significantly different. For example, it used true self-play allowing it to discover fully novel strategies.

Then again, I don't think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn't really improve for a few years we could get AGI.

However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent ... (read more)

Siebe*80

Yes it matters for current model performance, but it means that RLVR isn't actually improving the model in a way that can be used for an iterated distillation & amplification loop, because it doesn't actually do real amplification. If this turns out right, it's quite bearish for AI timelines

Edit: Ah someone just alerted me to the crucial consideration that this was tested using smaller models (like Qwen-2.5 (7B/14B/32B) and LLaMA-3.1-8B, which are significantly smaller than the models where RLVR has shown the most dramatic improvements (like DeepSeek-V... (read more)

1Siebe
More thoughts: I thought that AlphaZero was a counterpoint, but apparently it's significantly different. For example, it used true self-play allowing it to discover fully novel strategies. Then again, I don't think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn't really improve for a few years we could get AGI. However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent model, you know, IDA-style. But this paper makes me more skeptical of that working, because these agentic steps might require novel skills that aren't inside the training data.
Siebe30

That's good to know.

For what it's worth, ME/CFS (a disease/cluster of specific symptoms) is quite different from idiopathic chronic fatigue (a single symptom). Confusing the two is one of the major issues in the literature. Many people with ME/CFS, like I, don't even have 'feeling tired' as a symptom. Which is why I avoid the term CFS.

Siebe30

I haven't looked into this literature, but it sounds remarkably similar to the literature of cognitive behavioral therapy and graded exercise therapy for ME/CFS (also sometimes referred to as 'chronic fatigue syndrome'). I can imagine this being different for pain which could be under more direct neurological control.

Pretty much universally, this research was of low to very low quality. For example, using overly broad inclusion criteria such that many patients did not have the core symptom of ME/CFS, and only reporting subjective scores (which tend to impr... (read more)

3SoerenMind
Interesting to know more about the CFS literature here. Like you, I haven't found as much good research on it, at least with a quick search. (Though there's at least one pretty canonical reference connecting chronic fatigue and nociplastic pain FWIW.)  The research on neuroplastic pain seems to have a stronger evidence base. For example, some studies have 'very large' effect sizes (compared to placebo), publications with thousands of citations or in top tier journals, official recognition by the leading scientific body on pain research (IASP), and key note talks at the mainstream academic conferences on pain research. Spontaneous healing and placebo effects happen all the time of course. But in the cases I know, it was often very unlikely to happen at the exact time of treatment. Clear improvement was often timed precisely to the day, hour or even minute of treatments. In my case, a single psychotherapy session brought me from ~25% to ~85% improvement for leg pain, in both knees at once, after it lasted for years. Similar things happened with other pains in a short amount of time after they lasted for between 4 to 30 months. > Lastly, ignoring symptoms can be pretty dangerous so I recommend caution with the approach I also fear that knowing about neuroplastic pain will lead certain types of people to ignore physical problems and suffer serious damage.
Siebe30

I'm starting a discussion group on Signal to explore and understand the democratic backsliding of the US at ‘gears-level’. We will avoid simply discussing the latest outrageous thing in the news, unless that news is relevant to democratic backsliding.

Example questions:

  • “how far will SCOTUS support Trump's executive overreach?”

  • “what happens if Trump commands the military to support electoral fraud?”

  • "how does this interact with potentially short AGI timelines?”

  • "what would an authoritarian successor to Trump look like?"

  • "are there any neglected,

... (read more)
Siebe2-15

One way to operationalize "160 years of human time" is "thing that can be achieved by a 160-person organisation in 1 year", which seems like it would make sense?

1Mo Putera
Not if some critical paths are irreducibly serial.
1Rachel Shu
Possibly, but then you have to consider you can spin up possibly arbitrarily many instances of the LLM as well, in which case you might expect the trend to go even faster, as now you’re scaling on 2 axes, and we know parallel compute scales exceptionally well. Parallel years don’t trade off exactly with years in series, but “20 people given 8 years” might do much more than 160 given one, or 1 given 160, depending on the task.
8ErioirE
Unfortunately, when dealing with tasks such as software development it is nowhere near as linear as that.  The meta-tasks of each additional dev needing to be brought up to speed on the intricacies of the project, as well as lost efficiency from poor communication/waiting on others to finish things means you usually get diminishing (or even inverse) returns from adding more people to the project. See: The Mythical Man Month
Siebe70

This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset

Siebe11

Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance

2habryka
That's a pretty recent thing! Agree that it has become more used recently (in the last 1-2 years) for practicalreasons.
Siebe1-4

The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.

Bad faith

There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.

Bad faith

AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”

Seems wrong

3habryka
(Why do you believe this? I think this is a reasonable gloss of what happened around 2015-2016. I was part of many of those conversations, as I was also part of many of the conversations in which me and others gave up on "AI Alignment" as a thing that could meaningfully describe efforts around existential risk reduction)
Siebe80

What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.

Siebe54

This is very interesting, and I had a recent thought that's very similar:

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing

... (read more)
6Milan W
Reiterating my intention to just do this (data seeding) and my call for critiques before I proceed:
4Milan W
The concerns about data filtering raised in that post's comments[1] suggest doing aligned-CoT-seeding on the pretraining data may be a better thing to try instead. 1. ^ ex.: Jozdien citing gwern
3Milan W
This is indeed pretty relevant.
Siebe30

I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch

3Milan W
Maybe in isolation, but I get the feeling that time is of the essence.
Siebe165

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

1CstineSublime
I'll raise you an even stupider question: surely once an A.I. becomes sufficiently super-intelligent, all superintelligent systems will converge on certain values rather than be biased towards their initial training data? What expectations we condition it with about these first person stories about what it did will soon form only a small amount of it's corpus, as it interacts with the outside world and forms it's own models of the world, right? I mean the way people talk about post-Singularity A.I. that can either bring about utopia, or drop all of the bombs and launch wave after wave of robot minions upon us - surely that means that it is capable of fast learning feedback loops, right? (although maybe I'm mistaken, what they mean is a plethora of domain specific super-intelligences, not a single all benevolent one?) My understanding of AGI, not superintelligence, is a AI that can do the breadth of tasks a functional adult human can do. Now, that doesn't mean all the same tasks, but a similar degree of flexibility. Right? Put it in control of a robot arm and a baseball bat, and an AGI will teach itself how to hit a baseball as opposed to being trained by it's operators how to do it, it will have metacognitive abilities that will allow it to create a learning feedback loop. Now if it has metacognition, then chances are it has the ability to change it's own goals - just people people. Now imagine a therapy AGI - one day it is talking to a patient and then realizes (or thinks it realizes) that it understands the patient's goals and values better than the patient, and seeks to deceive or manipulation the patient towards the patient's own best-interest. Let's say the patient is suicidal, the AGI knows a way to outsmart the patient out of this action. Again, it has the ability to change it's own goals.  I mean, maybe it will be beholden to the initial training data? Maybe it will have a existential crises just like us? Analysis Paralysis and Ataxia brought on by inne
8Milan W
I have had this idea for a while. Seems like a good thing to do, looking from a simulators/direct value alignment frame. Might make corrigibility harder depending on exact implementation. Still, I'd expect it to be net-positive. Invitation for critiques: If nobody convinces me it's a bad idea in a week's time from posting, I'll just proceed to implementation.
Siebe4224

I think you should publicly commit to:

  • full transparency about any funding from for profit organisations, including nonprofit organizations affiliated with for profit
  • no access to the benchmarks to any company
  • no NDAs around this stuff

If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.

Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.

We could also ask if these situations exist ("is there any funder you have that you didn't disclose?" and so on, especially around NDAs), and Epoch could respond with Yes/No/Can'tReply[1].

Also seems relevant for other orgs.

This would only patch the kind of problems we can easily think about, but it seems to me like a good start

 

  1. ^

    I learned that trick from hpmor!

Siebe1717

I don't think that all media produced by AI risk concerned people needs to mention that AI risk is a big deal - that just seems annoying and preachy. I see Epoch's impact story as informing people of where AI is likely to go and what's likely to happen, and this works fine even if they don't explicitly discuss AI risk

I don't think that every podcast episode should mention AI risk, but it would be pretty weird in my eyes to never mention it. Listeners would understandably infer that "these well-informed people apparently don't really worry much, maybe I ... (read more)

Satron100

Listeners would understandably infer that "these well-informed people apparently don't really worry much, maybe I shouldn't worry much either".

I think this is countered to a great extent by all the well-informed people who worry a lot about AI risk. I think the "well-informed people apparently disagree on this topic, I better look into it myself" environment promotes inquiry and is generally good for truth-seeking.

More generally, I agree with @Neel Nanda, it seems somewhat doubtful that people listening to a very niche Epoch Podcast aren't aware of all the smart people worried about AI risk.

Siebe40

This is a really good comment. A few thoughts:

  1. Deployment had a couple of benefits: real-world use gives a lot of feedback on strengths, weaknesses, jailbreaks. It also generates media/hype that's good for attracting further investors (assuming OpenAI will want more investment in the future?)

  2. The approach you describe is not only useful for solving more difficult questions. It's probably also better at doing more complex tasks, which in my opinion is a trickier issue to solve. According to Flo Crivello:

We're starting to switch all our agentic steps

... (read more)
3Nathan Helm-Burger
So maybe you only want a relatively small amount of use, that really pushes the boundaries of what the model is capable of. So maybe you offer to let scientists apply to "safety test" your model, under strict secrecy agreements, rather than deploy it to the public. Oh.
Siebe-1-3

I didn't read the post, but just fyi that an automated AI R&D system already exists, and it's open-source: https://github.com/ShengranHu/ADAS/

I wrote the following comment about my safety concerns and notified Haize , Apollo, METR, and GovAI but only Haize replied https://github.com/ShengranHu/ADAS/issues/16#issuecomment-2354703344

Siebe33

this Washington Post article supports the 'Scheming Sam' Hypothesis: anonymous reports mostly from his time at Y Combinator

Siebe15

Meta's actions seem unrelated?

Siebe1614

Just coming to this now, after Altman's firing (which seems unrelated?)

At age 5, she began waking up in the middle of the night, needing to take a bath to calm her anxiety. By 6, she thought about suicide, though she didn’t know the word."

To me, this adds a lot of validity to the whole story and I haven't seen these points made:

  1. Becoming suicidal at such an early age isn't normal, and very likely has a strong environmental cause (like being abused, or losing a loved one)

  2. The bathing to relieve anxiety is typical sexual trauma behavior (e.g. https:/

... (read more)
Siebe10

Except that herd immunity isn't really a (permanent) thing; only temporary

1GeneSmith
True. It would be interesting to know how much a single (non-fatal) infection reduces the odds of mortality from subsequent infections. My guess is more than a single vaccination, but probably unlikely it can decrease it by as much as a 3-dose mRNA vaccine regiment.
Siebe10

I had not seen it, because I don't read this form these days. I can't reply in too much detail but here are some points:

I think it's a decent attempt, but a little biased towards the "statistically clever" estimate. I do agree that many studies are pretty done. However, I've seen good ones that include controls, confirm infection via PCR, are large, and have pre pandemic health data. This was in a Dutch presentation of a data set though, and not clearly reported for some reason. (This is the project, but their data is not publicly available: https://www.li... (read more)

Siebe10

Yes, vaccine injury is actually rather common - I've seen a lot of very credible case reports reporting either initiation of symptoms since vaccine (after having been infected), or more often worsening of symptoms. Top long COVID researchers also believe these.

I don't think the data for keto is that strong. Plenty of people with long COVID are trying it with not amazing results.

0superads91
"I've seen a lot of very credible case reports reporting either initiation of symptoms since vaccine (after having been infected), or more often worsening of symptoms. Top long COVID researchers also believe these." Interesting! "I don't think the data for keto is that strong. Plenty of people with long COVID are trying it with not amazing results." Keto + intermittent fasting + elimination diet + vitamin D3+K2. Often all of these 4 are needed. Which one is more important depends on the chronic disease. For instance, I've heard from EA sources how vitamin D3 supplementation alone has massive success in curing cluster headaches (one of the most painful conditions) where medications have very little success. Or how elimination diet is the deciding factor for some auto-immune diseases. Or how my mom suffered from horrible body pains from years, having seen dozens of doctors, until one told her to go do yoga, and after 6 months all pain was gone and has remained so for the last 15 years. I'm not doubting that many with long COVID might indeed fail even after implementing all 4. But when you're desperate you try everything. Sometimes the cure might be what seems like a trivial lifestyle change - I've seen it a thousand times.
Siebe30

The 15% is an upper estimate of people estimating 'some loss' of health, so not everyone would be severely disabled.

Unfortunately, the data isn't great, and I can't produce a robust estimate right now

3Sameerishere
FYI,  Alyssa Vance provided additional disability statistics https://www.lesswrong.com/posts/4z3FBfmEHmqnz3NEY/long-covid-risk-how-to-maintain-an-up-to-date-risk?commentId=GKmqE9PKXfRSKb5PC which suggest "serious, long-term illness from COVID is pretty unlikely."  Siebe, I would be interested to hear your take on that, since you seem to have a substantially more pessimistic view of this.
Siebe10

Uhm, no? I'm quoting you on the middle category, which overlaps with the long category.

Also, there's no need to speculate, because there have been studies linking severity and viral load to increased risk of long COVID. https://www.cell.com/cell/fulltext/S0092-8674(22)00072-1

7DirectedEvolution
I see what you mean. The study's criteria, which I didn't quote here, states that the earliest time at which the respondant met any of the conditions for a COVID infection should be counted. I remain confused (not by you, by the UK study)! I don't see myself as speculating, so much as emphasizing that contradictory evidence exists even about the association, not to mention causality.
Siebe30

You have far more faith in the rationality of government decision making during novel crises than I do.

Healthcare workers can barely or often not at all with with long covid.

Lowering infection rates, remaining able to work, and not needing to make high demands on the healthcare system seems much better for the economy. This is not an infohazard at all.

Siebe40

Awesome in depth response! Yes, I was hoping this post to serve as an initial alarm bell to look further into, rather than being definitive advice based on a comprehensive literature review.

I can't respond to everything, at least not at once, but here's some:

  • categories of 'at least 12 weeks' and 'at least 1 year' do overlap, right?
  • I think the different waves may have had different underreporting factors, with least underreporting during Delta, so we can't take those rates at face value, and I prefer using estimated cases whenever possible
1DirectedEvolution
The wording is “less than 12 weeks” rather than “at least 12 weeks,” so the categories shouldn’t overlap, time wise. Under the theory that omicron is underreported and delta more accurately reported, this bolsters the case for long COVID being linked to disease severity - with the caveat about the percentages not adding to 100% in mind.
Siebe30

See figure 2 of this large scale survey: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/7october2021

"As a proportion of the UK population, prevalence of self-reported long COVID was greatest in people aged 35 to 69 years, females, people living in more deprived areas, those working in health or social care, and those with another activity-limiting health condition or disability"

Siebe82

No, these problems are most probably cause by a lack of oxygen getting through to tissues. There's a large amount of patients reporting these severe symptoms in patients groups, and they're not elderly.

It honestly feels to me like you really want to believe long COVID isn't a big deal somehow.

4johnswentworth
It's not that I don't want to believe it, it's that long covid is the sort of thing I'd expect to hear people talk about and publish papers about even in a world where it isn't actually significant, and many of those papers would have statistically-significant positive results even in a world where long covid isn't actually significant. Long covid is a story which has too much memetic fitness independent of its truth value. So I have to apply enough skepticism that I wouldn't believe it in a world where it isn't actually significant. That sounds right for shortness of breath, chest pain, and low oxygen levels. I'm more skeptical that it's driving palpitations, fatigue, joint and muscle pain, brain fog, lack of concentration, forgetfulness, sleep disturbance, and digestive and kidney problems; those sound a lot more like a list of old-age issues.
Siebe60

In addition, we know that 100% of patients with long COVID have microclots, at least in this study: https://www.researchsquare.com/article/rs-1205453/v1

Interestingly, they diagnosed patients not via PCR or antibodies, but based on exclusion and symptom diagnosis:

"Patients gave consent to study their blood samples, following clinical examination and/or after filling in the South African Long COVID/PASC registry. Symptoms must have been new and persistent symptoms noted after acute COVID-19. Initial patient diagnosis was the end result of exclusions, only a... (read more)

1johnswentworth
This mostly sounds like age-related problems. I do expect generic age-related pathologies to be accelerated by covid (or any other major stressor), but if that's the bulk of what's going on, then I'd say "long covid" is a mischaracterization. It wouldn't be relevant to non-elderly people, and to elderly people it would be effectively the same as any other serious stressor.
Siebe*672

That French study is bunk.

Seropositivity is NOT AT ALL a good indicator for having had covid: https://wwwnc.cdc.gov/eid/article/27/9/21-1042_article

It is entirely possible that all those patients who believe they had COVID are right.

Some researchers believe absence of antibodies after infection is positively correlated with long covid (I don't have a source).

This study is bunk and it's harmful for adequate treatment of seronegative patients. The psychosomatic narrative has been a lazy answer stifling solid scientific research into illnesses that are not well understood yet.

1EGI
Same problem as with Lyme Disease. Weak or no antibody reaction is only good news IF it indicates absence of the pathogene. While this is not unreasonable to assume, it still needs to be demonstrated, preferably over a wide variety of differen tissues.
3Zvi
Writing up a Long Covid post and noticed this. Several things even taking study here at face value. Putting this here as a 'preprint' basically to see if there are counterarguments. And regardless, thanks for the link, it should be considered, but I do not think this constitutes bunk. One, everyone with a Ct of about 25 or lower got antibodies, so we're talking about light cases or outright false positives that then didn't get antibodies. And the spike in cases of Ct~37 is weird enough that I suspect something wrong with the PCRs.  Two, this implies that positive antibody test still means Covid (no false positives, only false negatives) so it would take a VERY large correlation with long Covid to have no correlation show up in the final data - keep in mind that Ct<25 still meant full positives later, so the correlation here can't be that big.  Three, we'd basically have to assume that virus count isn't linked to chance of long Covid or this doesn't make any sense, because all the high virus count cases are getting positives anyway. But lots of virus seems like it would be more likely to lead to long Covid because physics? Also from the French paper they use this source: https://pubmed.ncbi.nlm.nih.gov/33139419/ which reports tests have high accuracy and has >10x the sample size of the one linked above. My interpretation of the linked study here is 'sufficiently mild cases sometimes don't generate antibodies but show up on PCR, and/or PCR tests are getting false positives and we should not take Ct>30 very seriously. E.g. from here.  The bulk of the issues were in CT values >=32. Anyone have more thoughts? 
4RobertM
  I don't see any way in which the results of the French study are incompatible with a 64% true positive rate on "did this person previously have covid".  (Also, a 64% true positive rate is actually decent Bayesian evidence for having had covid, assuming a sufficiently large % of the underlying population has had covid, such that whatever the false positive rate is doesn't cause most/all of your positives to be false positives.)

Strong upvote, this is great info.

Siebe10

Seropositivity is also not a good indicator for having had covid: https://wwwnc.cdc.gov/eid/article/27/9/21-1042_article

Some researchers believe absence of antibodies after infection is positively correlated with long covid (I don't have a source).

This study is bunk and it's harmful for adequate treatment of seronegative treatment.

Siebe20

This was very informative!

How would you translate this into a heuristic? And how much do I need to have a secondary skill, rather than finding a partner that has a great complementary skill?

3johnswentworth
This ties into Pattern's comment too. Spreading out the skills across people introduces a bunch of problems: * For the sort of problems which lend themselves to breakthroughs in the first place, the key is often one discrete insight. There's no good way to modularize the problem; breaking it up won't help find the key piece. (This is a GEM consequence: if it's modularizable, it's probably already been modularized.) * Group dynamics: Isaac Asimov wrote a great piece about this. Creative problem-solving requires an exploratory mindset, and you need the right sort of group setup to support that. Also it doesn't scale well with group size. * Translation: different specialties use different jargon, and somebody needs to do the work of translating. Translation can be spread across two people, but that means spending a lot of time on "hey what's the word for a crunchy sweet red fruit that's sort of spherical?" It's much faster if one person knows both languages. * Unknown unknowns: if each person only knows one field well, then there may be a solution in one field for a problem in the other, and neither person even thinks to bring it up. It's tough to know what kinds of things are available in a field you don't know. All that said, obviously working in groups can theoretically leverage scale with less personal cost. Heuristics left as an exercise to the reader.
Siebe100

I am not sure why you believe good strategy research always has infohazards. That's a very strong claim. Strategy research is broader than 'how should we deal with other agents'. Do you think Drexler's Reframing Superintelligence: Comprehensive AI Systems or The Unilateralist's Curse were negative expected value? Because I would classify them as public, good strategy research with a positive expected value.

Are there any specific types of infohazards you're thinking of? (E.g. informing unaligned actors, getting media attention and negative public opinion)

3Jan Kulveit
Depends on what you mean by public. While I don't think you can have good public research processes which would not run into infohazards, you can have nonpublic process which produces good public outcomes. I don't think the examples count as something public - e.g. do you see any public discussion leading to CAIS?
Siebe100

I agree with you that #3 seems the most valuable option, and you are correct that we aren't as plugged in - although I am much less plugged in (yet) than the other two authors. I hope to learn more in the future about

  • How much explicit strategy research is actually going on behind close doors, rather than just people talking and sharing implicit models.
  • How much of all potential strategy research should be private, and how much should be public. My current belief is that more strategy research should be public than private, but my understanding of info ha
... (read more)
Siebe10

I'm not sure I understand what Allan is suggesting, but it feels pretty similar to what you're saying. Can you perhaps explain your understanding of how his take differs from yours?

I believe he suggests that there is a large space that contains strategically important information. However, rather than first trying to structure that space and trying to find the questions with the most valuable answers, he suggests that researchers should just try their hand at finding anything of value. Probably for two reasons:

  1. By trying to find anything of value, you
... (read more)