ozziegooen

Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling

Summary LLM outputs vary substantially across models, prompts, and simulated perspectives. I propose "opinion fuzzing" for systematically sampling across these dimensions to quantify and understand this variance. The concept is simple, but making it practically usable will require thoughtful tooling. In this piece I discuss what opinion fuzzing could be...

Dec 19, 202511

Shallow review of technical AI safety, 2025

WebsiteEditorialRepo Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review website. It’s shallow in the sense that...

Dec 17, 2025188

Announcing RoastMyPost: LLMs Eval Blog Posts and More

Today we're releasing RoastMyPost, a new experimental application for blog post evaluation using LLMs. Try it Here. TLDR * RoastMyPost is a new QURI application that uses LLMs and code to evaluate blog posts and research documents. * It uses a variety of LLM evaluators. Most are narrow checks: Fact...

Dec 17, 2025110

Information-Dense Conference Badges

See previous discussion here. I find a lot of professional events fairly soul-crushing and have been thinking about why. I dislike small talk. Recently I attended Manifest, and noticed that it could easily take 10 minutes of conversation to learn the very basics about a person. There were hundreds of...

Jun 13, 202528

Nuanced Models for the Influence of Information

A common fallacy I see regarding information sources is the assumption that they either have zero impact or completely determine critical decisions. * "Our World In Data isn't directly cited by the US President, therefore it's worthless." * "Prediction markets haven't been adopted by top elected officials, so they contribute...

Apr 10, 20258

6 (Potential) Misconceptions about AI Intellectuals

One of my key thoughts is that the bar of “human intellectuals” is just really low. I think that intellectual work is useful and that these intellectuals on the whole do produce some value, but I also think we can do much better. Epistemic Status A collection of thoughts I've...

Feb 14, 202518

$300 Fermi Model Competition

Summary Task: Make an interesting and informative Fermi estimate Prize: $300 for the top entry Deadline: February 16th, 2025 Results Announcement: By March 1st, 2025 Judges: Claude 3.5 Sonnet, the QURI team Motivation LLMs have recently made it significantly easier to make Fermi estimates. You can chat with most LLMs...

Feb 3, 202516

ozziegooen

ozziegooen

What's Going on With OpenAI's Messaging?

Shallow review of technical AI safety, 2025

Working in Virtual Reality: A Review

Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen

What's Going on With OpenAI's Messaging?

Shallow review of technical AI safety, 2025

Working in Virtual Reality: A Review

Announcing RoastMyPost: LLMs Eval Blog Posts and More

Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling

Shallow review of technical AI safety, 2025

Announcing RoastMyPost: LLMs Eval Blog Posts and More

Information-Dense Conference Badges

Nuanced Models for the Influence of Information

6 (Potential) Misconceptions about AI Intellectuals

$300 Fermi Model Competition