Yes, he is doing something, but he is optimizing for signal rather than the true thing. Becoming a drug addict, developing schizophrenia, killing yourself—those are all costly signals of engaging with the abyss.
What? Michael Vassar has (AFAIK from Zack M. Davis' descriptions) not taken drugs or promoted becoming a drug addict or "killing yourself". If you hear his Spencer interview, you'll notice that he seems very sane and erudite, and clearly does not give off the unhinged 'Nick Land' vibe that you seem to be claiming that he has or he promotes.
You ar...
As of right now, I expect we have at least a decade, perhaps two, until we get a human intelligence level generalizing AI (which is what I consider AGI). This is a controversial statement in these social circles, and I don't have the bandwidth or resources to write a concrete and detailed argument, so I'll simply state an overview here.
Scale is the key variable driving progress to AGI. Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien
IDK how to understand your comment as referring to mine.
I'm familiar with how Eliezer uses the term. I was more pointing to the move of saying something like "You are [slipping sideways out of reality], and this is bad! Stop it!" I don't think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.
In fact, there's a loopy thing here where you expect someone who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits th...
I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he's made it clear enough given the following line:
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.
And as for your last sentence:
...If you don’t, you’re spra
Seems like most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist. This is an underspecified claim, and given certain fully-specified instances of it, I'd agree.
But this belief leads to the following reasoning: (1) if we don't eat all this free energy in the form of researchers+compute+funding, someone else will; (2) other people are clearly less trustworthy compared to us (Anthropic, in this hypothetical); (3) let's do whatever it takes to m...
most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist.
I don't credit that they believe that. And, I don't credit that you believe that they believe that. What did they do, to truly test their belief--such that it could have been changed? For most of them the answer is "basically nothing". Such a "belief" is not a belief (though it may be an investment, if that's what you mean). What did you do to truly test that they truly tested their be...
If you meet Buddha on the road...
I recommend messaging people who seem to have experience doing so, and requesting to get on a call with them. I haven't found any useful online content related to this, and everything I've learned in relation to social skills and working with neurodivergent people, I learned by failing and debugging my failures.
I hope you've at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.
Yeah I think yours has achieved my goal -- a post to discuss this specific research advance. Please don't delete your post -- I'll move mine back to drafts.
I searched for it and found none. The twitter conversation also seems to imply that there has not been a paper / technical report out yet.
Based on your link, it seems like nobody even submitted anything to the contest throughout the time it existed. Is that correct?
yet mathematically true
This only seems to be the case because the equals sign is redefined in that sentence.
I expect that Ryan means to say one of the these things:
I've become somewhat pessimistic about encouraging regulatory power over AI development recently after reading this Bismarck Analysis case study on the level of influence (or lack of it) that scientists had over nuclear policy.
The impression I got from some other secondary/tertiary sources (specifically the book Organizing Genius) was that General Groves, the military man who was the interface between the military and Oppenheimer and the Manhattan Project, did his best to shield the Manhattan Project scientists from military and bureaucratic drudgery, and ...
I’m optimizing for consistently writing and publishing posts.
I agree with this strategy, and I plan to begin something similar soon. I forgot that Epistemological Fascinations is your less polished and more "optimized for fun and sustainability" substack. (I have both your substacks in my feed reader.)
I really appreciate this essay. I also think that most of it consists of sazens. When I read your essay, I find my mind bubbling up concrete examples of experiences I've had, that confirm or contradict your claims. This is, of course, what I believe is expected from graduate students when they are studying theoretical computer science or mathematics courses -- they'd encounter an abstraction, and it is on them to build concrete examples in their mind to get a sense of what the paper or textbook is talking about.
However, when it comes to more inchoate domai...
GPT-4o can not reproduce the string, and instead just makes up plausible candidates. You love to see it.
Hmm. I assume you could fine-tune away an LLM from reproducing the string. Eliciting it would just become more difficult. Try posting canary text, and a part of the canary string, and see if GPT-4o completes it.
Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).
I'm curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.
The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.
that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.
I do want to be clear that a major issue is that Anthropic used non-disparagement agreements that were covered by non-disclosure agreements. I think that's an additionally much more insidious thing to do, that contributed substantially to the harm caused by the OpenAI agreements, and I think is important fact to include here (and also makes the two situations even more analogous).
Project proposal: EpochAI for compute oversight
Detailed MVP description: website with an interactive map that shows locations of high risk data centers globally, with relevant information appearing when you click on the icons on the map. Examples of relevant information: organizations and frontier labs that have access to this compute, the effective FLOPS of the data center, what time would it take to train a SOTA model in that datacenter).
High risk datacenters are datacenters that are capable of training current or next generation SOTA AI systems.
Why:
Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).
Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.
Here's a relevant quote from the first essay in the sequence:
...Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence woul
There's generally a cost to managing people and onboarding newcomers, and I expect that offering to volunteer for free is usually a negative signal, since it implies that there's a lot more work than usual that would need to be done to onboard this particular newcomer.
Have you experienced otherwise? I'd love to hear some specifics as to why you feel this way.
I think we'll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren't able to continue experimenting to find algorithmic improvements.
Agent foundations research seems robust to such supply chain issues, but I'd argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely depe...
Thiel has historically expressed disbelief about AI doom, and has been more focused on trying to prevent civilizational decline. From my perspective, it is more likely that he'd fund an organization founded by people with accelerationist credentials, than by someone who was a part of a failed coup attempt that would look to him like it involved a sincere belief in an extreme difficulty of the alignment problem.
I'd love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.
I started reading your meta-rationality sequence, but it ended after just two posts without going into details.
David Chapman's website seems like the standard reference for what the post-rationalists call "metarationality". (I haven't read much of it, but the little I read made me somewhat unenthusiastic about continuing).
Note that the current power differential between evals labs and frontier labs is such that I don't expect evals labs have the slack to simply state that a frontier model failed their evals.
You'd need regulation with serious teeth and competent 'bloodhound' regulators watching the space like a hawk, for such a possibility to occur.
I just encountered polyvagal theory and I share your enthusiasm for how useful it is for modeling other people and oneself.
Note that I'm waiting for the entire sequence to be published before I read it (past the first post), so here's a heads up that I'm looking forward to seeing more of this sequence!
I think Twitter systematically underpromotes tweets with links external to the Twitter platform, so reposting isn't a viable strategy.
Thanks for the link. I believe I read it a while ago, but it is useful to reread it from my current perspective.
trying to ensure that AIs will be philosophically competent
I think such scenarios are plausible: I know some people argue that certain decision theory problems cannot be safely delegated to AI systems, but if we as humans can work on these problems safely, I expect that we could probably build systems that are about as safe (by crippling their ability to establish subjunctive dependence) but are also significantly more competent at philosophical progress than we are.
Leopold's interview with Dwarkesh is a very useful source of what's going on in his mind.
What happened to his concerns over safety, I wonder?
He doesn't believe in a 'sharp left turn', which means he doesn't consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have dou...
Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome.
If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.
Unsee the frontier lab.
These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don't understand why people have downvoted this comment. Here's an attempt to unravel my thoughts and potential disagreements with your claims.
AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.
I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like...
It seems like a significant amount of decision theory progress happened between 2006 and 2010, and since then progress has stalled.
Counterfactual mugging was invented independently by Gary Drescher in 2006, and by Vladimir Nesov in 2009.
Counterlogical mugging was invented by Vladimir Nesov in 2009.
The "agent simulates predictor" problem (now popularly known as the commitment races problem) was invented by Gary Drescher in 2010.
The "self-fulfilling spurious proofs" problem (now popularly known as the 5-and-10 problem) was invented by Benja Falle
You are missing providing a ridiculous amount of context, but yes, if you are okay with leather footwear, Meermin provides great footwear at relatively inexpensive prices.
I still recommend thrift shopping instead. I spent 250 EUR on a pair of new noots from Meermin, and 50 EUR on a pair of thrifted boots which seem about 80% as aesthetically pleasing as the first (and just as comfortable since I tried them on before buying them).
It has been six months since I wrote this, and I want to note an update: I now grok what Valentine is trying to say and what he is pointing at in Here's the Exit and We're already in AI takeoff. That is, I have a detailed enough model of Valentine's model of the things he talks about, such that I understand the things he is saying.
I still don't feel like I understand Kensho. I get the pattern of the epistemic puzzle he is demonstrating, but I don't know if I get the object-level thing he points at. Based on a reread of the comments, maybe what Valentine me...
I've experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.
I haven't experimented with a scaffolded LLM agent for autoformalization, but I expect it won't go very well either, primarily because scaffolding involves attempts to make human-like implicit high-lev...
This is very interesting, thank you for posting this.
the therapeutic idea of systematically replacing the concept “should” with less normative framings
Interesting. I independently came up with this concept, downstream of thinking about moral cognition and parts work. Could you point me to any past literature that talks about this coherently enough that you would point people to it to understand this concept?
I know that Nate has written about this:
As far as I recall, reading these posts didn't help me.
Based on gwern's comment, steganography as a capability can arise (at rather rudimentary levels) via RLHF over multi-step problems (which is effectively most cognitive work, really), and this gets exacerbated with the proliferation of AI generated text that embeds its steganographic capabilities within it.
The following paragraph by gwern (from the same thread linked in the previous paragraph) basically summarizes my current thoughts on the feasibility of prevention of steganography for CoT supervision:
...Inner-monologue approaches to safety, in the new skin
Well, if you know relevant theoretical CS and useful math, you don’t have to rebuild the mathematical scaffolding all by yourself.
I didn't intend to imply in my message that you have mathematical scaffolding that you are recreating, although I expect it may be likely (Pearlian causality perhaps? I've been looking into it recently and clearly knowing Bayes nets is very helpful). I specifically used "you" to imply that in general this is the case. I haven't looked very deep into the stuff you are doing, unfortunately -- it is on my to-do list.
I do think that systematic self-delusion seems useful in multi-agent environments (see the commitment races problem for an abstract argument, and Sarah Constantin's essay "Is Stupidity Strength?" for a more concrete argument.
I'm not certain that this is the optimal strategy we have for dealing with such environments, and note that systematic self-delusion also leaves you (and the other people using a similar strategy to coordinate) vulnerable to risks that do not take into account your self-delusion. This mainly includes existential risks such as misaligne...
According to Eliezar Yudkowsky, your thoughts should reflect reality.
I expect that the more your beliefs track reality, the better you'll get at decision making, yes.
According to Paul Graham, the most successful people are slightly overconfident.
Ah but VCs benefit from the ergodicity of the startup founders! From the perspective of the founder, its a non-ergodic situation. Its better to make Kelly bets instead if you prefer to not fall into gambler's ruin, given whatever definition of the real world situation maps onto the abstract concept of being ...
Even if I'd agree with your conclusion, your argument seems quite incorrect to me.
That's what math always is. The applicability of any math depends on how well the mathematical models reflect the situation involved.
... (read more)