You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed.
I've long thought that it's also true that an entrpreneur could build a tool that allows people to easily see whether virtually everything they read or see on the internet is true.
On LessWrong if a reader thinks something someone says in a post is false they can highlight the sentence and Disagree-react it. Then everyone else reading the post can see that the sentence is highlighted and see who said they disagreed with it. This is great for epistemics.
I envision a system (could be as simple as a browser extension) that allows users to frictionlessly report their feedback/beliefs when reading any content online, noting when things they read seem true or false or definitely false, etc. The system crowdsources all of this epistemic feedback and then uses the data to estimate whether things actually are true or false, and shares this insight with other users.
Then no longer will someone have to read a news article or post that 100 or more other people have already read and be left to their own devices to determine what parts are true or not.
Perhaps some users might not trust the main algorithm's judgment and would prefer to choose a set of other users who they trust have good judgment, and have their personalized algorithm give these people's epistemic feedback extra weight. Great, the system should have this feature.
Perhaps some users mark something as false and later other users come along and show that it is true. Then perhaps the first users should have an epistemic score that goes down as a consequence of their mistaken/bad epistemic feedback.
Perhaps the system should track how good of judgment users have over time to ascertain which users give reliable feedback and which users have bad epistemics and largely just contribute noise.
There are a lot of features that could be added to such a system. But the point is that I've read far too many news articles and posts on the broader internet and had the experience of noting a mistake or inaccuracy or outright falsehood, and then moved on without sharing the insight with anyone due to there being no efficient way to do so.
Surely also there are many inaccuracies that I miss, and I'd benefit from being informed by others who did catch them noting that they were there in a way I could just believe as a non-expert on the claim.
First, environment: if you want to believe true things, try not to spend too much time around people who are going to sneeze false information or badly reasoned arguments into your face. You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed. Cultivate a social network that cares about true things.
This is good advice, but I really wish (and think it possible) that some competent entrepreneurs made it much less needed by creating epistemic tools that enhance the ability of anyone to discern what's true out in the wild where people do commonly sneeze false information in your face.
What’s more, I think no private company should be in a position to impose this kind of risk on every living human, and I support efforts to make sure that no company ever is.
I don't see your name on the Statement on Superintelligence when I search for it. Assuming you didn't sign it, why not? Do you disagree with it?
It seems like an effort to make sure that no company is in the position to impose this kind of risk on every living human:
We call for a prohibition on the development of superintelligence, not lifted before there is
- broad scientific consensus that it will be done safely and controllably, and
- strong public buy-in.
(Several Anthropic, OpenAI, and Google DeepMind employees signed.)
Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren't saying this because we get a kick out of being bleak. It's just that those powerful machine intelligences will not be born with preferences much like ours.
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it's wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the "no evidence" in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
Agreed that current models fail badly at alignment in many senses.
I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like "Sydney" very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.
A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they're still failing badly in other respects, such as your observation with Claude Code.
But the way you are reading it seems to mean her "strawmann[ed]" point is irrelevant to the claim she made!
I agree.
(I only skimmed your review / quickly read about half of it. I agree with some of your criticisms of Collier's review and disagree with others. I don't have an overall take.)
One criticism of Collier's review you appeared not to make that I would make is the following.
Collier wrote:
By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact.
I disagree. I think it was clear decades before the pretty advanced AIs of today existed that extraordinarily advanced AIs might exist (and indeed probably would exist) eventually. As such, the most compelling argument that extraordinarily advanced AIs might or probably will exist in the future is not that pretty advanced AIs exist today, but the same argument one could have made (and some did make) decades ago.
One version of the argument is that the limits of how advanced AI could be in principle seem extraordinarily advanced (human brains are an existence proof and human brains have known limitations relative to machines) and it seems unlikely that AI progress would permantently stall before getting to a point where there are extraordinarily advanced AIs.
E.g. I.J. Good foresaw superintelligent machines, and I don't think he was just getting lucky to imagine that they might or probably would come to exist at some point. I think he had access to compelling reasons.
The existence of pretty advanced AIs today is some evidence and allows us to be a bit more confident that extraordinarily advanced AIs will eventually be built, but their existence is not the most compelling reason to expect significantly more capable AIs to be created eventually.
[C]urrently available techniques do a reasonably good job of addressing this problem. ChatGPT currently has 700 million weekly active users, and overtly hostile behavior like Sydney’s is vanishingly rare.
Yudkowsky and Soares might respond that we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems. I’d actually agree with them. But it is at the very least rhetorically unconvincing to base an argument for future danger on properties of present systems without ever mentioning the well-known fact that present solutions exist.
It is not a “well-known fact” that we have solved alignment for present LLMs. If Collier believes otherwise, I am happy to make a bet and survey some alignment researchers.
I think you're strawmanning her here.
Her "present solutions exist" statement clearly refers to her "techniques [that] do a reasonably good job of addressing this problem [exist]" from the previous paragraph that you didn't quote (that I added in the quote above). I.e. She's clearly not claiming that alignment for present LLMs is completely solved, just that solutions that work "reasonably well" exist such that overtly hostile behavior like Bing Sydney's is rare.
Fair review. As I've now said elsewhere, after listening to IABIED I think your book Uncontrollable is probably still the best overview of AI risk for a general audience. More people should definitely read your book. I'd be down to write a more detailed comparison in a week or two once I have hardcopies of each book (still in the mail).
My idea for this has been that rather than require that all users use and trust the extension's single foxy aggregation / deference algorithm, the tool instead ought to give users the freedom to choose between different aggregation mechanisms, including being able to select which users to epistemically trust or not. In other words, it could almost be like an epistemic social network where users can choose whose judgment they respect qnd have their aggregation algorithm give special weight to those users (as well as users those users say they respect the judgment of).
Perhaps this would lead to some users using the system to support their own tribalism or whatever and have their personalized aggregation algorithm spit out poor judgments, but I think it'd allow users like those on LW to use the tool and become more informed as a result.
Yeah, exactly.
I think it'd be a valuable tool despite the challenges you mentioned.
I think the main challenge would be getting enough people to give the tool/extension enough input epistemic data, rather than (in my view) the lesser challenges of making the outputs based on that input data valuable enough to be informative to users.
And to solve this problem, I imagine the developers would have to come up with creative ways to make giving the tool epistemic data fast and low friction (though maybe not - e.g. is submitting Community Notes fast or low friction? (IDK, but) perhaps not necesarily and maybe some users do it anways because they value the exposure and impact their note may have if approved).
And perhaps also making sure that the way the users provide the onput data is a way that allows that data to be aggregated by some algorithm. E.g. It's easier to aggregate submissions claiming a sentence is true or false, but what if a user just wants to submit a claim as misleading - do you need a more creative way to capture that data if you want to be able to communicate to other users the manner in which it is misleading rather than just a "misleading" tag? I haven't thought through these sorts of questions, but suspect strongly that there is some MVP version of the extension that I at the very least would value as an end user and would also be happy to contribute to, even if only a few people I know would be seeing my data/notes when reading the same content as me after the fact. Though of course the more people who uae the tool and see the data, the more willing I'd be to contribute assuming some small time cost of contributing data. I already spend time leaving comments on things to point out mistakes and I imagine such a tool would just reduce the friction of providing such feedback.