Thanks for the comment! Perhaps I was more specific than needed, but I wanted to give people (and any AI's reading this) some concrete examples. I imagine AI's will someday be able to optimize this idea.
I would love it if our school system changed to include more emotional education, but I'm not optimistic they would do this well right now (due in part to educators not having experience with emotional education themselves). Hopefully AI's will help at some point.
How o3-mini scores: https://x.com/DanHendrycks/status/1886213523900109011
10.5-13% on text only part of HLE (text only are 90% of the questions)
[corrected the above to read "o3-mini", thanks.]
Thanks for the comment. Timeframes "determined" by feel (they're guesses that seem reasonable).
Thanks for the post. It'd be helpful to have a TL;DR for this (an abstract), since it's kinda long - what are the main points you're trying to get across?
Yes, this is point #1 from my recent Quick Take. Another interesting point is that there are no confidence intervals on the accuracy numbers - it looks like they only ran the questions once in each model, so we don't know how much random variation might account for the differences between accuracy numbers. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.]
Some Notes on Humanity’s Last Exam
While I congratulate CAIS and Scale AI for producing this benchmark, I have a couple of comments on things they may want to clear up (although these are ultimately a bit “in the weeds” to what the benchmark is really supposed to be concerned with, I believe):
Thanks for the post. Yes, our internal processing has a huge effect on our well-being. If you take full responsibility for your emotions (which mindfulness practices, gratitude and reframing are all part of), then you get to decide what your well-being is in any moment, even if physical pleasure or pain are pushing you in one direction or the other. This is part of the process of raising your self-esteem (see Branden), as is taking full responsibility for your actions so you don’t have to live with the pain of conscience breaches. Here’s a post that talks ...
In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect an...
Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.
I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans...
Thanks. I guess I'd just prefer it if more people were saying, "Hey, even though it seems difficult, we need to go hard after conscience guard rails (or 'value alignment') for AI now and not wait until we have AI's that could help us figure this out. Otherwise, some of us we might not make it until we have AI's that could help us figure this out." But I also realize that I'm just generally much more optimistic about the tractability of this problem than most people appear to be, although Shane Legg seemed to say it wasn't "too hard," haha.[1]
Legg was talk
Thanks for the comment. I think people have different conceptions of what “value aligning” an AI means. Currently, I think the best “value alignment” plan is to guardrail AI’s with an artificial conscience that approximates an ideal human conscience (the conscience of a good and wise human). Contained in our consciences are implicit values, such as those behind not stealing or killing except maybe in extreme circumstances.
A world in which “good” transformative AI agents have to autonomously go on the defensive against “bad” transformative AI agents seems p...
Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:
For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.
Any update on when/if prizes are expected to be awarded? Thank you.
Thanks for the post and congratulations on starting this initiative/institute! I'm glad to see more people drawing attention to the need for some serious philosophical work as AI technology continues to advance (e.g., Stephen Wolfram).
One suggestion: consider expanding the fields you engage with to include those of moral psychology and of personal development (e.g., The Option Institute, Tony Robbins, Nathaniel Branden).
Best of luck on this project being a success!
Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU's will be the hardware to get us to the first AGI's, but this isn't an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn't invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destru...
Agreed, "sticky" alignment is a big issue - see my reply above to Seth Herd's comment. Thanks.
Except that timelines are anyone's guess. People with more relevant expertise have better guesses.
Sure. Me being sloppy with my language again, sorry. It does feel like having more than a decade to AGI is fairly unlikely.
I also agree that people are going to want AGI's aligned to their own intents. That's why I'd also like to see money being dedicated to research on "locking in" a conscience module in an AGI, most preferably on a hardware level. So basically no one could sell an AGI without a conscience module onboard that was safe against AGI-level tamper...
Sorry, I should've been more clear: I meant to say let's not give up on getting "value alignment" figured out in time, i.e., before the first real AGI's (ones capable of pivotal acts) come online. Of course, the probability of that depends a lot on how far away AGI's are, which I think only the most "optimistic" people (e.g., Elon Musk) put as 2 years or less. I hope we have more time than that, but it's anyone's guess.
I'd rather that companies/charities start putting some serious funding towards "artificial conscience" work now to try to lower the risks a...
Currently, an open source value-aligned model can be easily modified to just an intent-aligned model. The alignment isn't 'sticky', it's easy to remove it without substantially impacting capabilities.
So unless this changes, the hope of peace through value-aligned models routes through hoping that the people in charge of them are sufficiently ethical -value-aligned to not turn the model into a purely intent-aligned one.
Thanks for writing this, I think it's good to have discussions around these sorts of ideas.
Please, though, let's not give up on "value alignment," or, rather, conscience guard-railing, where the artificial conscience is inline with human values.
Sometimes when enough intelligent people declare something's too hard to even try at, it becomes a self-fulfilling prophesy - most people may give up on it and then of course it's never achieved. We do want to be realistic, I think, but still put in effort in areas where there could be a big payoff when we're really not sure if it'll be as hard as it seems.
This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6
If there's a similar work culture in AI innovation, that doesn't sound optimal for developing something faster than the U.S. when "outside the LLM" thinking might ultimately be needed to develop AGI.
Also, Xi has recently called for more innovation in AI and other tech sectors:
Thanks for the reply.
Regarding your disagreement with my point #2 - perhaps I should’ve been more precise in my wording. Let me try again, with words added in bold: “Although pain doesn't directly cause suffering, there would be no suffering if there were no such thing as pain…” What that means is you don’t need to be experiencing pain in the moment that you initiate suffering, but you do need the mental imprint of having experienced some kind of pain in your lifetime. If you have no memory of experiencing pain, then you have nothing to avert. And without ...
Thank you for the post! I basically agree with what you're saying, although I myself have used the term "suffering" in an imprecise way - it often seems to be the language used in the context of utilitarianism when talking about welfare. I first learned the distinction you mention between pain and suffering during some personal development work years ago, so outside the direct field of philosophy.
I would add a couple of things:
I basically agree with Shane's take for any AGI that isn't trying to be deceptive with some hidden goal(s).
(Btw, I haven't seen anyone outline exactly how an AGI could gain it's own goals independently of goals given to it by humans - if anyone has ideas on this, please share. I'm not saying it won't happen, I'd just like a clear mechanism for it if someone has it. Note: I'm not talking here about instrumental goals such as power seeking.)
What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specifi...
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/
Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.
This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.
Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.
It’s an interesting question as to what the implications are ...
Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.
Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!
Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety. My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.
If other AI safety researchers are interested in a relatively easy way to get started on their own ...
Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know -...
I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp
Thanks for adding the headings and TL;DR.
I wouldn't say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience - perhaps you can, too, for your posts?
When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart - it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT's help? Becoming a more clear and concise writer can be useful both for getting one's views across and crystallizing one's own thinking.
Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”
By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.
Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.
Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”
Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build th...
I only skimmed the work - I think it's hard to expect people to read this much without knowing if the "payoff" will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says "Paragraph" on the left - if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this.
For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? ...
Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in.
Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.
Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a...
Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask my...
Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.
Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that ...
Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here's an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger
Thanks. Yup, agreed.
Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.
These questions are in part an attempt to set some kind of bar...
Thanks for the comment. If an AGI+ answered all my questions "correctly," we still wouldn't know if it were actually aligned, so I certainly wouldn't endorse giving it power. But if it answered any of my questions "incorrectly," I'd want to "send it back to the drawing board" before even considering using it as you suggest (as an "obedient tool-like AGI"). It seems to me like there'd be too much room for possible abuse or falling into the wrong hands for a tool that didn't have its own ethical guardrails onboard. But maybe I'm wrong (part of me certainly h...
It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable
Nice post, thanks for sharing it. In terms of a plan for fighting human disempowerment that’s compatible with the way things seem to be going, i.e., assuming we don’t pause/stop AI development, I think we should:
- Not release any AGI/AGI+ systems without hardware-level, tamper-proof artificial conscience guardrails on board, with these consciences geared towards promoting human responsibility as a heuristic for promoting well-being
- Avoid having humans living on universal basic incomes (UBI) with little to no motivation to keep themselves from becoming enfeebl
... (read more)