Yes it matters for current model performance, but it means that RLVR isn't actually improving the model in a way that can be used for an iterated distillation & amplification loop, because it doesn't actually do real amplification. If this turns out right, it's quite bearish for AI timelines
Edit: Ah someone just alerted me to the crucial consideration that this was tested using smaller models (like Qwen-2.5 (7B/14B/32B) and LLaMA-3.1-8B, which are significantly smaller than the models where RLVR has shown the most dramatic improvements (like DeepSeek-V...
That's good to know.
For what it's worth, ME/CFS (a disease/cluster of specific symptoms) is quite different from idiopathic chronic fatigue (a single symptom). Confusing the two is one of the major issues in the literature. Many people with ME/CFS, like I, don't even have 'feeling tired' as a symptom. Which is why I avoid the term CFS.
I haven't looked into this literature, but it sounds remarkably similar to the literature of cognitive behavioral therapy and graded exercise therapy for ME/CFS (also sometimes referred to as 'chronic fatigue syndrome'). I can imagine this being different for pain which could be under more direct neurological control.
Pretty much universally, this research was of low to very low quality. For example, using overly broad inclusion criteria such that many patients did not have the core symptom of ME/CFS, and only reporting subjective scores (which tend to impr...
I'm starting a discussion group on Signal to explore and understand the democratic backsliding of the US at ‘gears-level’. We will avoid simply discussing the latest outrageous thing in the news, unless that news is relevant to democratic backsliding.
Example questions:
“how far will SCOTUS support Trump's executive overreach?”
“what happens if Trump commands the military to support electoral fraud?”
"how does this interact with potentially short AGI timelines?”
"what would an authoritarian successor to Trump look like?"
"are there any neglected,
One way to operationalize "160 years of human time" is "thing that can be achieved by a 160-person organisation in 1 year", which seems like it would make sense?
This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset
Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance
The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.
Bad faith
There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.
Bad faith
AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”
Seems wrong
What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.
This is very interesting, and I had a recent thought that's very similar:
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
...The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing
Looks like Evan Hubinger has done some very similar research just recently: https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward
I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives
I think you should publicly commit to:
If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.
Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.
We could also ask if these situations exist ("is there any funder you have that you didn't disclose?" and so on, especially around NDAs), and Epoch could respond with Yes/No/Can'tReply[1].
Also seems relevant for other orgs.
This would only patch the kind of problems we can easily think about, but it seems to me like a good start
I learned that trick from hpmor!
I don't think that all media produced by AI risk concerned people needs to mention that AI risk is a big deal - that just seems annoying and preachy. I see Epoch's impact story as informing people of where AI is likely to go and what's likely to happen, and this works fine even if they don't explicitly discuss AI risk
I don't think that every podcast episode should mention AI risk, but it would be pretty weird in my eyes to never mention it. Listeners would understandably infer that "these well-informed people apparently don't really worry much, maybe I ...
Listeners would understandably infer that "these well-informed people apparently don't really worry much, maybe I shouldn't worry much either".
I think this is countered to a great extent by all the well-informed people who worry a lot about AI risk. I think the "well-informed people apparently disagree on this topic, I better look into it myself" environment promotes inquiry and is generally good for truth-seeking.
More generally, I agree with @Neel Nanda, it seems somewhat doubtful that people listening to a very niche Epoch Podcast aren't aware of all the smart people worried about AI risk.
This is a really good comment. A few thoughts:
Deployment had a couple of benefits: real-world use gives a lot of feedback on strengths, weaknesses, jailbreaks. It also generates media/hype that's good for attracting further investors (assuming OpenAI will want more investment in the future?)
The approach you describe is not only useful for solving more difficult questions. It's probably also better at doing more complex tasks, which in my opinion is a trickier issue to solve. According to Flo Crivello:
...We're starting to switch all our agentic steps
I didn't read the post, but just fyi that an automated AI R&D system already exists, and it's open-source: https://github.com/ShengranHu/ADAS/
I wrote the following comment about my safety concerns and notified Haize , Apollo, METR, and GovAI but only Haize replied https://github.com/ShengranHu/ADAS/issues/16#issuecomment-2354703344
this Washington Post article supports the 'Scheming Sam' Hypothesis: anonymous reports mostly from his time at Y Combinator
Meta's actions seem unrelated?
Just coming to this now, after Altman's firing (which seems unrelated?)
At age 5, she began waking up in the middle of the night, needing to take a bath to calm her anxiety. By 6, she thought about suicide, though she didn’t know the word."
To me, this adds a lot of validity to the whole story and I haven't seen these points made:
Becoming suicidal at such an early age isn't normal, and very likely has a strong environmental cause (like being abused, or losing a loved one)
The bathing to relieve anxiety is typical sexual trauma behavior (e.g. https:/
Except that herd immunity isn't really a (permanent) thing; only temporary
I had not seen it, because I don't read this form these days. I can't reply in too much detail but here are some points:
I think it's a decent attempt, but a little biased towards the "statistically clever" estimate. I do agree that many studies are pretty done. However, I've seen good ones that include controls, confirm infection via PCR, are large, and have pre pandemic health data. This was in a Dutch presentation of a data set though, and not clearly reported for some reason. (This is the project, but their data is not publicly available: https://www.li...
Yes, vaccine injury is actually rather common - I've seen a lot of very credible case reports reporting either initiation of symptoms since vaccine (after having been infected), or more often worsening of symptoms. Top long COVID researchers also believe these.
I don't think the data for keto is that strong. Plenty of people with long COVID are trying it with not amazing results.
The 15% is an upper estimate of people estimating 'some loss' of health, so not everyone would be severely disabled.
Unfortunately, the data isn't great, and I can't produce a robust estimate right now
Uhm, no? I'm quoting you on the middle category, which overlaps with the long category.
Also, there's no need to speculate, because there have been studies linking severity and viral load to increased risk of long COVID. https://www.cell.com/cell/fulltext/S0092-8674(22)00072-1
You have far more faith in the rationality of government decision making during novel crises than I do.
Healthcare workers can barely or often not at all with with long covid.
Lowering infection rates, remaining able to work, and not needing to make high demands on the healthcare system seems much better for the economy. This is not an infohazard at all.
Awesome in depth response! Yes, I was hoping this post to serve as an initial alarm bell to look further into, rather than being definitive advice based on a comprehensive literature review.
I can't respond to everything, at least not at once, but here's some:
See figure 2 of this large scale survey: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/7october2021
"As a proportion of the UK population, prevalence of self-reported long COVID was greatest in people aged 35 to 69 years, females, people living in more deprived areas, those working in health or social care, and those with another activity-limiting health condition or disability"
No, these problems are most probably cause by a lack of oxygen getting through to tissues. There's a large amount of patients reporting these severe symptoms in patients groups, and they're not elderly.
It honestly feels to me like you really want to believe long COVID isn't a big deal somehow.
In addition, we know that 100% of patients with long COVID have microclots, at least in this study: https://www.researchsquare.com/article/rs-1205453/v1
Interestingly, they diagnosed patients not via PCR or antibodies, but based on exclusion and symptom diagnosis:
"Patients gave consent to study their blood samples, following clinical examination and/or after filling in the South African Long COVID/PASC registry. Symptoms must have been new and persistent symptoms noted after acute COVID-19. Initial patient diagnosis was the end result of exclusions, only a...
That French study is bunk.
Seropositivity is NOT AT ALL a good indicator for having had covid: https://wwwnc.cdc.gov/eid/article/27/9/21-1042_article
It is entirely possible that all those patients who believe they had COVID are right.
Some researchers believe absence of antibodies after infection is positively correlated with long covid (I don't have a source).
This study is bunk and it's harmful for adequate treatment of seronegative patients. The psychosomatic narrative has been a lazy answer stifling solid scientific research into illnesses that are not well understood yet.
Strong upvote, this is great info.
Seropositivity is also not a good indicator for having had covid: https://wwwnc.cdc.gov/eid/article/27/9/21-1042_article
Some researchers believe absence of antibodies after infection is positively correlated with long covid (I don't have a source).
This study is bunk and it's harmful for adequate treatment of seronegative treatment.
This was very informative!
How would you translate this into a heuristic? And how much do I need to have a secondary skill, rather than finding a partner that has a great complementary skill?
I am not sure why you believe good strategy research always has infohazards. That's a very strong claim. Strategy research is broader than 'how should we deal with other agents'. Do you think Drexler's Reframing Superintelligence: Comprehensive AI Systems or The Unilateralist's Curse were negative expected value? Because I would classify them as public, good strategy research with a positive expected value.
Are there any specific types of infohazards you're thinking of? (E.g. informing unaligned actors, getting media attention and negative public opinion)
I agree with you that #3 seems the most valuable option, and you are correct that we aren't as plugged in - although I am much less plugged in (yet) than the other two authors. I hope to learn more in the future about
I'm not sure I understand what Allan is suggesting, but it feels pretty similar to what you're saying. Can you perhaps explain your understanding of how his take differs from yours?
I believe he suggests that there is a large space that contains strategically important information. However, rather than first trying to structure that space and trying to find the questions with the most valuable answers, he suggests that researchers should just try their hand at finding anything of value. Probably for two reasons:
More thoughts:
I thought that AlphaZero was a counterpoint, but apparently it's significantly different. For example, it used true self-play allowing it to discover fully novel strategies.
Then again, I don't think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn't really improve for a few years we could get AGI.
However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent ... (read more)