Nice post, thanks for sharing it. In terms of a plan for fighting human disempowerment that’s compatible with the way things seem to be going, i.e., assuming we don’t pause/stop AI development, I think we should:

Not release any AGI/AGI+ systems without hardware-level, tamper-proof artificial conscience guardrails on board, with these consciences geared towards promoting human responsibility as a heuristic for promoting well-being
Avoid having humans living on universal basic incomes (UBI) with little to no motivation to keep themselves from becoming enfeebled - a conditional supplemental income (CSI) might be one way to do this

Does #1 have potential risks and pitfalls, and is it going to be difficult to figure out and implement in time? Yes, but more people focusing more effort on it would help. And AI's that have conscience around disempowering humans seems like a good first step to help avoid disempowering humans.

#1 would also help against what I think is a more immediate threat: use of advanced AI’s by bad human actors to purposely or uncaringly cause destruction, such as in the pursuit of making money. Autonomous advanced defensive AI’s with artificial conscience guardrails could potentially limit collateral damage while preventing/defending against attacks. The speed of such attacks will likely be too great for humans to be in the loop on decisions made to defend against them.

Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World

sweenesm2mo20

Thanks for the comment! Perhaps I was more specific than needed, but I wanted to give people (and any AI's reading this) some concrete examples. I imagine AI's will someday be able to optimize this idea.

I would love it if our school system changed to include more emotional education, but I'm not optimistic they would do this well right now (due in part to educators not having experience with emotional education themselves). Hopefully AI's will help at some point.

OpenAI releases deep research agent

sweenesm2mo*3-2

How o3-mini scores: https://x.com/DanHendrycks/status/1886213523900109011

10.5-13% on text only part of HLE (text only are 90% of the questions)

[corrected the above to read "o3-mini", thanks.]

How to Give Coming AGI's the Best Chance of Figuring Out Ethics for Us

sweenesm2mo10

Thanks for the comment. Timeframes "determined" by feel (they're guesses that seem reasonable).

nikola's Shortform

sweenesm2mo*104

Yes, this is point #1 from my recent Quick Take. Another interesting point is that there are no confidence intervals on the accuracy numbers - it looks like they only ran the questions once in each model, so we don't know how much random variation might account for the differences between accuracy numbers. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.]

sweenesm's Shortform

sweenesm2mo*192

Some Notes on Humanity’s Last Exam

While I congratulate CAIS and Scale AI for producing this benchmark, I have a couple of comments on things they may want to clear up (although these are ultimately a bit “in the weeds” to what the benchmark is really supposed to be concerned with, I believe):

DeepSeek-R1 and Gemini 2.0 Flash Thinking were released after the deadline for submitting questions eligible for prizes (though submissions remained open after this). Thus, these models weren’t used to screen most, if not all, questions. This means that the questions were preferentially screened to stump the other models, but not these, so it wouldn’t be too surprising if these models scored better than others.
After reading the paper, my impression is that these questions were run through the models each only one time (after the one time they were run through some of the models when originally submitted). If people want to get into the weeds and say that DeepSeek-R1 is actually better on this exam than OpenAI’s o1, it would be useful to run the questions through each model at least 6 times to establish some confidence intervals on the accuracy numbers. I suspect that this would show the differences between o1 and R1 are not statistically significant. It would be interesting to know the typical size of the confidence intervals, though, and whether these confidence interval sizes shift when “reasoning” is involved in the model or not. (It would also likely be useful if reporting on any and all benchmarks for AI’s required including confidence intervals so we could feel better that people weren’t gaming the system and just reporting their best results.) Running the questions on more models that weren’t originally used for question screening, such as Llama 3, could help establish even more of a baseline. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.]
20% of questions are multiple-choice. If each multiple-choice question has 5 possible answers, then random guessing would yield 4% accuracy on the exam as a whole. It would be interesting to know what the actual average number of answers was for the multiple-choice questions, and thus the actual “random guessing accuracy.”

Also, it might be interesting to take some of the multiple-choice questions and rewrite them by randomly removing one of the answer choices and replacing it with “none of the above.” If the model chooses “none of the above," then see if it can come up with the right answer on its own, rather than from a list (if indeed the right answer isn’t present). Personally, I always found multiple-choice questions in which you weren’t sure if the right answer was there to be more difficult - when the right answer is there, sometimes you can take clues from it to figure out that it’s the right answer. Rewriting some questions in this way could make them a little more difficult without much added work by the exam preparers.

Finally, having models take the multiple-choice section of the exam numerous times with slight variations in the wording of the questions, without changing their meanings, could make this section of the exam a little more robust against “luck.”

Note: I submitted two materials science-related multiple-choice questions with 5 answer options each for consideration in Humanity’s Last Exam. For submitting questions (https://agi.safe.ai/submit), the process was to type your question in an input window, enter as many multiple choice answers as you wanted (I think the minimum was 5 and there might not have been a maximum), and then this question was run through various models (GTP-4o, Sonnet 3.5, Gemini Pro 1.5, o1) to see if they gave the correct answer. The paper says that the screening criterion was that “multiple-choice questions must stump all but one model to account for potential lucky guesses.” I think I didn’t submit my questions unless it could stump all the models.

In case you’re interested, you can find my one question that made the cut by searching for “sintering” in the dataset available at HuggingFace. For my one question that didn’t make the cut, my strategy was to focus on an area in which there’ve been some false ideas presented in the literature that later got cleared up. I figured this might make it harder for LLM’s to answer correctly. I don’t know why the question didn’t make the cut, though, so don’t take this strategy as the reason. Just note that it’s possible that some of the other questions that made the final list could’ve been written with this sort of strategy in mind.

Well-being in the mind, and its implications for utilitarianism

sweenesm2mo20

Thanks for the post. Yes, our internal processing has a huge effect on our well-being. If you take full responsibility for your emotions (which mindfulness practices, gratitude and reframing are all part of), then you get to decide what your well-being is in any moment, even if physical pleasure or pain are pushing you in one direction or the other. This is part of the process of raising your self-esteem (see Branden), as is taking full responsibility for your actions so you don’t have to live with the pain of conscience breaches. Here’s a post that talks more about these things.

If we solve alignment, do we die anyway?

sweenesm2mo42

In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect anything after that like it would for a guard-railed AI that could do a huge amount of good after that if it weren’t shunned by society. Improving humans’ consciences and ethics would hopefully help avoid them hating the AI for saving them.

Also, if there were enough people, especially in power, who had strong consciences and senses of ethics, then maybe we’d be able to shift the political landscape from its current state of countries seemingly having different values and not trusting each other, to a world in which enforceable international agreements could be much more readily achieved.

I’m happy for people to work on increasing public awareness and trying for legislative “solutions,” but I think we should be working on artificial conscience at the same time - when there’s so much uncertainty about the future, it’s best to bet on a whole range of approaches, distributing your bets according to how likely you think different paths are to succeed. I think people are under-estimating the artificial conscience path right now, that’s all.

Thanks for all your comments!

If we solve alignment, do we die anyway?

sweenesm2mo50

Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.

I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans, then we made some sort of mistake. We most likely either got the artificial conscience wrong because that would’ve implicitly valued human life so wouldn’t have let a guard-railed AI wipe out humans, or we didn’t get an artificial conscience on board enough AI’s in time. An AI that had a “real” conscience also wouldn’t wipe out humans against the will of humans.

The way I currently envision the “typical” artificial conscience is that it would put a pretty strong conscience weight on not doing what its user wanted it to do, but this could be over-ruled by the conscience weight of not doing anything to prevent catastrophes. So the defensive, artificial conscience-guard-railed AI I’m thinking of would do the “last resort” things that were necessary to avoid s-risks, x-risks, and major catastrophes from coming to fruition, even if this wasn’t popular with most people, at least up to a point. If literally everyone in the world said, “Hey, we all want to die,” then the guard-railed AI, if it thought the people were in their “right mind,” would respect their wishes and let them die.

All that said, if we could somehow pause development of autonomous AI’s everywhere around the world until humans got their act together, developing their own consciences and senses of ethics, and were working as one team to cautiously take the next steps forward with AI, that would be great.