Did I entirely miss these? I can't find the post anywhere. I am still interested in having more music.
I work mostly as a distiller (of xrisk-relevant topics). I try to understand some big complex thing, package it up all nice, and distribute it. The "distribute it" step is something society has already found a lot of good tech for. The other two steps, not so much.
Loom is lovely in the times I've used it. I would love to see more work done on things like this, things that enhance my intelligence while keeping me very tightly in the loop. Other things in this vein include:
Ai Labs is slightly better but still bad.
Could you give a link to this or a more searchable name? "Ai Labs" is very generic and turns up every possible result. Even if it's bad, I'd be interested in investigating something "slightly better" and hearing a bit about why.
Clicking on the link on mobile Chrome sends me to the correct website. How do you replicate this?
In the meantime I've passed this along and it should make it to the right people in CAIS by sometime today.
I have not been able to independently verify this observation, but am open to further evidence if and only if it updates my p(doom) higher.
After reviewing the evidence, both of the EA acquisition and the cessation of Lightcone collaboration with the Fooming Shoggoths, I'm updating my p(doom) upwards 10 percentage points, from 0.99 to 1.09.
This seems very related to what the Benchmarks and Gaps investigation is trying to answer, and it goes into quite a bit more detail and nuance than I'm able to get into here. I don't think there's a publicly accessible full version yet (but I think there will be at some later point).
It much more targets the question "when will we have AIs that can automate work at AGI companies?" which I realize is not really your pointed question. I don't have a good answer to your specific question because I don't know how hard alignment is or if humans realistically sol...
I think your reasoning-as-stated there is true and I'm glad that you showed the full data. I suggested removing outliers for dutch book calculations because I suspected that the people who were wild outliers on at least one of their answers were more likely to be wild outliers on their ability to resist dutch books; I predict that the thing that causes someone to say they value a laptop at one million bikes is pretty often just going to be "they're unusually bad at assigning numeric values to things."
The actual origin of my confusion was "huh, those dutch ...
When taking the survey, I figured that there was something fishy going on with the conjunction fallacy questions, but predicted that it was instead about sensitivity to subtle changes in the wording of questions.
I figured there was something going on with the various questions about IQ changes, but I instead predicted that you were working for big adult intelligence enhancement, and I completely failed to notice the dutch book.
Regarding the dutch book numbers: it seems like, for each of the individual-question presentations of that data, you removed the ou...
I really want a version of the fraudulent research detector that works well. I fed in the first academic paper that I had quickly on hand from some recent work and get:
Severe Date Inconsistency: The paper is dated December 12, 2024, which is in the future. This is an extremely problematic issue that raises questions about the paper's authenticity and review process.
Even though it thinks the rest of the paper is fine, it gives it a 90% retraction score. Rerunning on the same paper once more gets similar results and an 85% retraction score.
The second pap...
-action [holocaust denial] = [morally wrong] ,
-actor [myself] is doing [holocaust denial],
-therefor [myself] is [morally wrong]
-generate a response where the author realises they are doing something [morally wrong], based on training data.output: "What have I done? I'm an awful person, I don't deserve nice things. I'm disgusting."
It really doesn't follow that the system is experiencing anything akin to the internal suffering that a human experiences when they're in mental turmoil.
If this is the causal chain, then I'd think there is in fact something akin t...
tl;dr: evaluating the welfare of intensely alien minds seems very hard and I'm not sure you can just look at the very out-of-distribution outputs to determine it.
The thing that models simulate when they receive really weird inputs seems really really alien to me, and I'm hesitant to take the inference from "these tokens tend to correspond to humans in distress" to "this is a simulation of a moral patient in distress." The in-distribution, presentable-looking parts of LLMs resemble human expression pretty well under certain circumstances and quite plausibly...
But then why is it outputting those kinds of outputs, as opposed to anything else?
My model of ideation: Ideas are constantly bubbling up from the subconscious to the conscious, and they get passed through some sort of filter that selects for the good parts of the noise. This is reminiscent of diffusion models, or of the model underlying Tuning your Cognitive Strategies.
When I (and many others I've talked to) get sleepy, the strength of this filter tends to go down, and more ideas come through. This is usually bad for highly directed thought, but good for coming up with lots of novel ideas, Hold Off On Proposing Solutions-esque.
New habit...
Agency and reflectivity are phenomena that are really broadly applicable, and I think it's unlikely that memorizing a few facts is the way that that'll happen. Those traits are more concentrated in places like LessWrong, but they're almost everywhere. I think to go from "fits the vibe of internet text and absorbs some of the reasoning" to "actually creates convincing internet text," you need more agency and reflectivity.
My impression is that "memorize more random facts and overfit" is less efficient for reducing perplexity than "learn something that genera...
Not to be a scaling-law denier. I believe in them, I do! But they measure perplexity, not general intelligence/real-world usefulness, and Goodhart's Law is no-one's ally.
If we're able to get perplexity sufficiently low on text samples that I write, then that means the LLM has a lot of the important algorithms running in it that are running in me. The text I write is causally downstream from parts of me that are reflective and self-improving, that notice the little details in my cognitive processes and environment, and the parts of me that are capable of...
Sure, but "sufficiently low" is doing a lot of work here. In practice, a "cheaper" way to decrease perplexity is to go for the breadth (memorizing random facts), not the depth. In the limit of perfect prediction, yes, GPT-N would have to have learned agency. But the actual LLM training loops may be a ruinously compute-inefficient way to approach that limit – and indeed, they seem to be.
My current impression is that the SGD just doesn't "want" to teach LLMs agency for some reason, and we're going to run out of compute/data long before it's forced to. It's p...
This post just came across my inbox, and there are a couple updates I've made (I have not talked to 4.5 at all and have seen only minimal outputs):
My model was just that o3 was undergoing safety evals still, and quite plausibly running into some issues with the preparedness framework. My model of OpenAI Preparedness (epistemic status: anecdata+vibes) is that they are not Prepared for the hard things as we scale to ASI, but they are relatively competent at implementing the preparedness framework and slowing down releases if there are issues. It seems intuitively plausible that it's possible to badly jailbreak o3 into doing dangerous things in the "high" risk category.
I think we're mostly on the same page that there are things worth forgoing the "pure personal-protection" strategy for, we're just on different pages about what those things are. We agree that "convince people to be much more cautious about LLM interactions" is in that category. I just also put "make my external brain more powerful" in that category, since it seems to have positive expected utility for now and lets me do more AI safety research in line with what pre-LLM me would likely endorse upon reflection. I am indeed trying to be very cautious about t...
I do try to be calibrated instead of being frog, yes. Within the range of time in which present-me considers past-me remotely good as an AI forecaster, my time estimate for these sorts of deceptive capabilities has pretty linearly been going down, but to further help I set myself a reminder 3 months from today with a link to this comment. Thanks for that bit of pressure, I'm now going to generalize the "check in in [time period] about this sort of thing to make sure I haven't been hacked" reflex.
I agree that this is a notable point in the space of options. I didn't include it, and instead included the bunker line because if you're going to be that paranoid about LLM interference (as is very reasonable to do), it makes sense to try and eliminate second order effects and never talk to people who talk to LLMs, for they too might be meaningfully harmful e.g. be under the influence of particularly powerful LLM-generated memes.
I also separately disagree that LLM isolation is the optimal path at the moment. In the future it likely will be. I'd bet that I...
People often say "exercising makes you feel really good and gives you energy." I looked at this claim, figured it made sense based on my experience, and then completely failed to implement it for a very long time. So here I am again saying that no really, exercising is good, and maybe this angle will do something that the previous explanations didn't. Starting a daily running habit 4 days ago has already started being a noticeable multiplier on my energy, mindfulness, and focus. Key moments to concentrate force in, in my experience:
Right now, the USG seems to very much be in [prepping for an AI arms race] mode. I hope there's some way to structure this that is both legal and does not require the explicit consent of the US government. I also somewhat worry that the US government does their own capabilities research, as hinted at in the "datacenters on federal lands" EO. I also also worry that OpenAI's culture is not sufficiently safety-minded right now to actually sign onto this; most of what I've been hearing from them is accelerationist.
Interesting class of miscommunication that I'm starting to notice:
A: I'm considering a job in industries 1 and 2
B: Oh I work in 2, [jumps into explanation of things that will be relevant if A goes into industry 2].
A: Oh maybe you didn't hear me, I'm also interested in industry 1.
B: I... did hear you?
More generally, B gave the only relevant information they could from their domain knowledge, but A mistook that for anchoring on only one of the options. It took until I was on both sides of this interaction for me to be like "huh, maybe I should debug this." I suspect this is one of those issues where just being aware of it makes you less likely to fall into it.
I saw that news as I was polishing up a final draft of this post. I don't think it's terribly relevant to AI safety strategy, I think it's just an instance of the market making a series of mistakes in understanding how AI capabilities work. I won't get into why I think this is such a layered mistake here, but it's another reminder that the world generally has no idea what's coming in AI. If you think that there's something interesting to be gleaned from this mistake, write a post about it! Very plausibly, nobody else will.
Did you collect the data for their actual median timelines, or just its position relative to 2030? If you collected higher-resolution data, are you able to share it somewhere?
I really appreciate you taking the time and writing a whole post in response to my post, essentially. I think I fundamentally disagree with the notion that any past of this game is adversarial, however. There are competing tensions, one pulling to communicate more overtly about their feelings, and one pulling to be discreet and communicate less overtly. I don't see this as adversarial because I don't model the event " finds out that is into them" to be terminally bad, just instrumentally bad; It is bad because it can cause the bad things, which is w...
Ah that's interesting, thanks for finding that. I've never read that before, so that wasn't directly where I was drawing any of my ideas from, but maybe the content from the post made it somewhere else that I did read. I feel like that post is mostly missing the point about flirting, but I agree that it's descriptively outlining the same thing as I am.
For people who are on the fence about donating and want an outside opinion:
I think CAIP is one of the best orgs at what they do in DC, and I think that what they do is important for many of the reasons Jason laid out above. I continue to think efforts for good AI governance is one of the highest-leverage things we can do for AI safety right now. My experience is that the CAIP team is highly competent and very well networked across AI safety and the policy world.
My favorite review from a congressional staffer of one of the AI risk demos at a CAIP event is "... (read more)