This post explores the concept of simulators in AI, particularly self-supervised models like GPT. Janus argues that GPT and similar models are best understood as simulators that can generate various simulacra, not as agents themselves. This framing helps explain many counterintuitive properties of language models. Powerful simulators could have major implications for AI capabilities and alignment.
DeepSeek-R1-Lite-Preview was announced today. It's available via chatbot. (Post; translation of Chinese post.)
DeepSeek says it will release the weights and publish a report.
The model appears to be stronger than o1-preview on math, similar on coding, and weaker on other tasks.
DeepSeek is Chinese. I'm not really familiar with the company. I thought Chinese companies were at least a year behind the frontier; now I don't know what to think and hope people do more evals and play with this model. Chinese companies tend to game benchmarks more than the frontier Western companies, but I think DeepSeek hasn't gamed benchmarks much historically.
The post also shows inference-time scaling, like o1:
Note that o1 is substantially stronger than o1-preview; see the o1 post:
(Parts of this post and some of my comments are stolen from various people-who-are-not-me.)
It seems that 76.6% originally came from the GPT-4o announcement blog post. I'm not sure why it dropped to 60.3% by the time of o1's blog post.
Many of you readers may instinctively know that this is wrong. If you flip a coin (50% chance) twice, you are not guaranteed to get heads. The probability of getting a heads is 75%. However you may be surprised to learn that there is some truth to this statement; modifying the statement just slightly will yield not just a true statement, but a useful and interesting one.
It's a spoiler, though. If you want to figure this out as you read this article yourself, you should skip this and then come back. Ok, ready? Here it is:
It's a chance and I did it times, so the probability should be... .
Almost always.
Suppose you're flipping a coin and you want to find the probability of NOT flipping a single heads in a...
Years ago when I was hanging out with day traders there was a heuristic they all seemed to hold. If their trading model was producing winning trades two out of three times they thought the model was good and could be used. No one ever suggested why that particular rate was the shared meme/norm -- why not 4 out of 5 or 3 out of 5. I wonder if empirically (or just intuitively over time) they simply approximated the results in this post.
Or maybe just a coincidence, but generally when money is at stake I think the common practices will tend to reflect some fundamental fact of the environment.
In the course of my life, there have been a handful of times I discovered an idea that changed the way I thought about where our species is headed. The first occurred when I picked up Nick Bostrom’s book “superintelligence” and realized that AI would utterly transform the world. The second was when I learned about embryo selection and how it could change future generations. And the third happened a few months ago when I read a message from a friend of mine on Discord about editing the genome of a living person.
We’ve had gene therapy to treat cancer and single gene disorders for decades. But the process involved in making such changes to the cells of a living person is excruciating and extremely expensive. CAR T-cell therapy,...
I think it is an obvious yes. I tend to think of intelligence as being efficient with energy, fast, accounting for any possibly useful stimuli or information and creativity. Now creativity, relies very much so on memory. One of the functions of the nervous system is to navigate and keep safe of the body from dying basically, keeping the system on. If that has to be done, then it is useful to have nervous system that properly, or barely or enough properly maps the environment, collects information and keeps a record of it too so when that encounter happens ...
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race for Machine Superintelligence. Consider subscribing to stay up to date with my work.
The US-China AI rivalry is entering a dangerous new phase.
Earlier today, the US-China Economic and Security Review Commission (USCC) released its annual report, with the following as its top recommendation:
...Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as
Oh to be clear I don’t think it was bad for you to post this as-is. Just that I’d like to see more followup
Nobody designing a financial system today would invent credit cards. The Western world uses credit cards because replacing legacy systems is expensive. China doesn't use credit cards. They skipped straight from cash to WeChat Pay. Skipping straight to the newest technology when you're playing catch-up is called leapfrogging.
A world-class military takes decades to create. The United States' oldest active aircraft carrier was commissioned in 1975. For reference, the Microsoft Windows operating system was released in 1985. The backbone of NATO's armed forces was designed for a world before autonomous drones and machine learning.
The United States dominates at modern warfare. Developed in WWII, modern warfare combines tanks, aircraft, artillery and mechanized[1] infantry to advance faster than the enemy can coordinate a response.
Modern warfare is expensive—and not just because...
You're right. I just like the phrase "postmodern warfare" because I think it's funny.
NB: This week there is a film-watching event afterwards. Vote in the comments on what film we watch. Yes, you have to read the sequences in order to join the film-watching.
Come get old-fashioned with us, and let's read the sequences at Lighthaven! We'll show up, mingle, do intros, and then split off into randomized groups for some sequences discussion. Please do the reading beforehand - it should be no more than 20 minutes of reading.
This group is aimed for people who are new to the sequences and would enjoy a group experience, but also for people who've been around LessWrong and LessWrong meetups for a while and would like a refresher.
This meetup will also have dinner provided! We'll be ordering pizza-of-the-day from Sliver (including 2 vegan pizzas).
The content/minute rate is too low, it follows 1960s film standards where audiences weren't interested in science fiction films unless concepts were introduced to them very very slowly (at the time they were quite satisfied by this due to lower standards, similar to Shakespeare).
As a result it is not enjoyable (people will be on their phones) unless you spend much of the film either thinking or talking with friends about how it might have affected the course of science fiction as a foundational work in the genre (almost every sci-fi fan and writer at the time watched it).
I haven't decided yet whether to write up a proper "Why Not Just..." for the post's proposal, but here's an overcompressed summary. (Note that I'm intentionally playing devil's advocate here, not giving an all-things-considered reflectively-endorsed take, but the object-level part of my reflectively-endorsed take would be pretty close to this.)
Charlie's concern isn't the only thing it doesn't handle. The only thing this proposal does handle is an AI extremely similar to today's, thinking very explicitly about intentional deception, and even then the propos...
Trump and the Republican party will wield broad governmental control during what will almost certainly be a critical period for AGI development. In this post, we want to briefly share various frames and ideas we’ve been thinking through and actively pitching to Republican lawmakers over the past months in preparation for the possibility of a Trump win.
Why are we sharing this here? Given that >98% of the EAs and alignment researchers we surveyed earlier this year identified as everything-other-than-conservative, we consider thinking through these questions to be another strategically worthwhile neglected direction.
(Along these lines, we also want to proactively emphasize that politics is the mind-killer, and that, regardless of one’s ideological convictions, those who earnestly care about alignment must take seriously the possibility that Trump will be the US president...
Thanks for clarifying. By "policy" and "standards" and "compelled speech" I thought you meant something more than community norms and customs. This is traditionally an important distinction to libertarians and free speech advocates. I think the distinction carves reality at the joints, and I hope you agree. I agree that community norms and customs can be unwelcoming.
Yeah, IMO we should just add a bunch of functionality for integrating alignment forum stuff more with academic things. It’s been on my to do list for a long time.