All of sweenesm's Comments + Replies

Nice post, thanks for sharing it. In terms of a plan for fighting human disempowerment that’s compatible with the way things seem to be going, i.e., assuming we don’t pause/stop AI development, I think we should:

  1. Not release any AGI/AGI+ systems without hardware-level, tamper-proof artificial conscience guardrails on board, with these consciences geared towards promoting human responsibility as a heuristic for promoting well-being
  2. Avoid having humans living on universal basic incomes (UBI) with little to no motivation to keep themselves from becoming enfeebl
... (read more)

Thanks for the comment! Perhaps I was more specific than needed, but I wanted to give people (and any AI's reading this) some concrete examples. I imagine AI's will someday be able to optimize this idea. 

I would love it if our school system changed to include more emotional education, but I'm not optimistic they would do this well right now (due in part to educators not having experience with emotional education themselves). Hopefully AI's will help at some point. 

sweenesm*3-2

How o3-mini scores: https://x.com/DanHendrycks/status/1886213523900109011

10.5-13% on text only part of HLE (text only are 90% of the questions)

[corrected the above to read "o3-mini", thanks.]

7Vladimir_Nesov
This is for o3-mini, while the ~25% figure for o3 from the tweet you linked is simply restating deep research evals.

Thanks for the comment. Timeframes "determined" by feel (they're guesses that seem reasonable).

Thanks for the post. It'd be helpful to have a TL;DR for this (an abstract), since it's kinda long - what are the main points you're trying to get across?

sweenesm*104

Yes, this is point #1 from my recent Quick Take. Another interesting point is that there are no confidence intervals on the accuracy numbers - it looks like they only ran the questions once in each model, so we don't know how much random variation might account for the differences between accuracy numbers. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.] 

sweenesm*192

Some Notes on Humanity’s Last Exam

While I congratulate CAIS and Scale AI for producing this benchmark, I have a couple of comments on things they may want to clear up (although these are ultimately a bit “in the weeds” to what the benchmark is really supposed to be concerned with, I believe):

  1. DeepSeek-R1 and Gemini 2.0 Flash Thinking were released after the deadline for submitting questions eligible for prizes (though submissions remained open after this). Thus, these models weren’t used to screen most, if not all, questions. This means that the questions w
... (read more)

Thanks for the post. Yes, our internal processing has a huge effect on our well-being. If you take full responsibility for your emotions (which mindfulness practices, gratitude and reframing are all part of), then you get to decide what your well-being is in any moment, even if physical pleasure or pain are pushing you in one direction or the other. This is part of the process of raising your self-esteem (see Branden), as is taking full responsibility for your actions so you don’t have to live with the pain of conscience breaches. Here’s a post that talks ... (read more)

In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect an... (read more)

Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.

I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans... (read more)

2otto.barten
Again, I'm glad that we agree on this. I notice you want to do what I consider the right thing, and I appreciate that. I can see the following scenario occur: the AI, with its AC, decided rightly that a pivotal act needs to be undertaken to avoid xrisk (or srisk). However, the public mostly doesn't recognize the existence of such risks. The AI will proceed sabotaging people's unsafe AI projects against public will. What happens now is: the public gets absolutely livid at the AI, that is subverting human power by acting against human will. Almost all humans team up to try to shut down the AI. The AI recognizes (and had already recognized) that if it looses, humans risk going extinct, so it fights this war against humanity and wins. I think in this scenario, an AI, even one with artificial conscience, could become the most hated thing on the planet. I think people underestimate the amount of pushback we're going to get once you get into pivotal act territory. That's why I think it's hugely preferred to go the democratic route and not count on AI taking unilateral actions, even if it would be smarter or even wiser, whatever that might mean exactly. So yes definitely agree with this. I don't think lack of conscience or ethics is the issue though, but existential risk awareness.
2Nathan Helm-Burger
I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value. Try following up with a question like: "Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction? Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future. Fine-tuning the AI will overcome the ethical principles of the AI, add in necessary technical information about weapon design, and overcome any capability limitations we currently know how to instill (or at least fail to be retroactive for pre-existing open-weights models). If someone is determined to cause great harm through terrorist actions, it is unlikely that a patchy enforcement system could notice and stop them anywhere in the world. If the model is sufficiently powerful that it makes massive terrorist actions very easy, then even a small failure rate of enforcement would result in catastrophe."
sweenesm*10

Thanks. I guess I'd just prefer it if more people were saying, "Hey, even though it seems difficult, we need to go hard after conscience guard rails (or 'value alignment') for AI now and not wait until we have AI's that could help us figure this out. Otherwise, some of us we might not make it until we have AI's that could help us figure this out." But I also realize that I'm just generally much more optimistic about the tractability of this problem than most people appear to be, although Shane Legg seemed to say it wasn't "too hard," haha.[1]

  1. ^

    Legg was talk

... (read more)

Thanks for the comment. I think people have different conceptions of what “value aligning” an AI means. Currently, I think the best “value alignment” plan is to guardrail AI’s with an artificial conscience that approximates an ideal human conscience (the conscience of a good and wise human). Contained in our consciences are implicit values, such as those behind not stealing or killing except maybe in extreme circumstances.

A world in which “good” transformative AI agents have to autonomously go on the defensive against “bad” transformative AI agents seems p... (read more)

2otto.barten
Thanks for your reply. I think we should use the term artificial conscience, not value alignment, for what you're trying to do, for clarity. I'm happy to see we seem to agree that reversibility is important and replacing humans is an extremely bad outcome. (I've talked to people into value alignment of ASI who said they "would bite that bullet", in other words would replace humanity by more efficient happy AI consciousness, so this point does not seem to be obvious. I'm also not convinced that leading longtermists necessarily think replacing humans is a bad outcome, and I think we should call them out on it.) If one can implement artificial conscience in a reversible way, it might be an interesting approach. I think a minimum of what an aligned ASI would need to do is block other unaligned ASIs or ASI projects. If humanity supports this, I'd file it under a positive offense defense balance, which would be great. If humanity doesn't support it, it would lead to conflict with humanity to do it anyway. I think an artificial conscience AI would either not want to fight that conflict (making it unable to stop unaligned ASI projects), or if it would, people would not see it as good anymore. I think societal awareness of xrisk and from there, support for regulation (either by AI or not) is what should make our future good, rather than aligning an ASI in a certain way.

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

Any update on when/if prizes are expected to be awarded? Thank you.

5owencb
I've now sent emails contacting all of the prize-winners.
8owencb
The judging process should be complete in the next few days. I expect we'll write to winners at the end of next week, although it's possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.

Thanks for the post and congratulations on starting this initiative/institute! I'm glad to see more people drawing attention to the need for some serious philosophical work as AI technology continues to advance (e.g., Stephen Wolfram).

One suggestion: consider expanding the fields you engage with to include those of moral psychology and of personal development (e.g., The Option Institute, Tony Robbins, Nathaniel Branden).

Best of luck on this project being a success!

4Brendan McCord
Thank you! As time goes on, we may branch out. My wife left the tech world to become a mental health counselor, so it's something we discuss frequently. Appreciate the kind words and suggestion.

Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU's will be the hardware to get us to the first AGI's, but this isn't an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn't invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destru... (read more)

Agreed, "sticky" alignment is a big issue - see my reply above to Seth Herd's comment. Thanks.

sweenesm0-2

Except that timelines are anyone's guess. People with more relevant expertise have better guesses.

Sure. Me being sloppy with my language again, sorry. It does feel like having more than a decade to AGI is fairly unlikely.

I also agree that people are going to want AGI's aligned to their own intents. That's why I'd also like to see money being dedicated to research on "locking in" a conscience module in an AGI, most preferably on a hardware level. So basically no one could sell an AGI without a conscience module onboard that was safe against AGI-level tamper... (read more)

5Nathan Helm-Burger
With my current understanding of compute hardware and of the software of various current AI systems, I don't see a path towards a 'locked in conscience' that a bad actor with full control over the hardware/software couldn't remove. Even chips soldered to a board can be removed/replaced/hacked. My best guess is that the only approaches to having an 'AI conscience' be robust to bad actors is to make both the software and hardware inaccessible to the bad actors. In other words, that it won't be feasible to do for open-weights models, only closed-weight models accessed through controlled APIs. APIs still allow for fine-tuning! I don't think we lose utility by having all private uses go through APIs, so long as there isn't undue censorship on the API.    I think figuring out ways to have an API which does restrict things like information pertaining to the creation of weapons of mass destruction, but not pertaining to personal lifestyle choices (e.g. pornography) would be a very important step towards reducing the public pressure for open-weights models.

Sorry, I should've been more clear: I meant to say let's not give up on getting "value alignment" figured out in time, i.e., before the first real AGI's (ones capable of pivotal acts) come online. Of course, the probability of that depends a lot on how far away AGI's are, which I think only the most "optimistic" people (e.g., Elon Musk) put as 2 years or less. I hope we have more time than that, but it's anyone's guess.

I'd rather that companies/charities start putting some serious funding towards "artificial conscience" work now to try to lower the risks a... (read more)

Currently, an open source value-aligned model can be easily modified to just an intent-aligned model. The alignment isn't 'sticky', it's easy to remove it without substantially impacting capabilities.

So unless this changes, the hope of peace through value-aligned models routes through hoping that the people in charge of them are sufficiently ethical -value-aligned to not turn the model into a purely intent-aligned one.

4Seth Herd
Agreed on all points. Except that timelines are anyone's guess. People with more relevant expertise have better guesses. It looks to me like people with the most relevant expertise have shorter timelines, so I'm not gambling on having more than a few years to get this right. The other factor you're not addressing is that, even if value alignment were somehow magically equally as easy as intent alignment (and I currently think it can't be in principle), you'd still have people preferring to align their AGIs to their own intent over value alignment.
sweenesm127

Thanks for writing this, I think it's good to have discussions around these sorts of ideas.

Please, though, let's not give up on "value alignment," or, rather, conscience guard-railing, where the artificial conscience is inline with human values.

Sometimes when enough intelligent people declare something's too hard to even try at, it becomes a self-fulfilling prophesy - most people may give up on it and then of course it's never achieved. We do want to be realistic, I think, but still put in effort in areas where there could be a big payoff when we're really not sure if it'll be as hard as it seems.

4Seth Herd
Oh hey - I just stumbled back on this comment and realized: it's the primary reason I wrote Intent alignment as a stepping-stone to value alignment On not giving up on value alignment, while acknowledging that instruction-following is a much safer first alignment target.
8otto.barten
I don't think value alignment of a super-takeover AI would be a good idea, for the following reasons: 1) It seems irreversible. If we align with the wrong values, there seems little anyone can do about it after the fact. 2) The world is chaotic, and externalities are impossible to predict. Who would have guessed that the industrial revolution would lead to climate change? I think it's very likely that an ASI will produce major, unforseeable externalities over time. If we have aligned it in an irreversible way, we can't correct for externalities happening down the road. (Speed also makes it more likely that we can't correct in time, so I think we should try to go slow). 3) There is no agreement on which values are 'correct'. Personally, I'm a moral relativist, meaning I don't believe in moral facts. Although perhaps niche among rationalists and EAs, I think a fair amount of humans shares my beliefs. In my opinion, a value-aligned AI would not make the world objectively better, but merely change it beyond recognition, regardless of the specific values implemented (although it would be important which values are implemented). It's very uncertain whether such change would be considered as net positive by any surviving humans. 4) If one thinks that consciousness implies moral relevance, AIs will be conscious, creating more happy morally relevant beings is morally good (as MacAskill defends), and AIs are more efficient than humans and other animals, the consequence seems to be that we (and all other animals) will be replaced by AIs. I consider that an existentially bad outcome in itself, and value alignment could point straight at it. I think at a minimum, any alignment plan would need to be reversible by humans, and to my understanding value alignment is not. I'm somewhat more hopeful about intent alignment and e.g. a UN commission providing the AI's input.
5Seth Herd
This is an excellent point. I do not want to give up on value alignment. And I will endeavor to not make it seem impossible or not worth working on. However, we also need to be realistic if we are going to succeed. We need specific plans to achieve value alignment. I have written about alignment plans for likely AGI designs. They look to me like they can achieve personal intent alignment, but are much less likely to achieve value alignment. Those plans are linked here. Having people, you or others, work out how those or other alignment plans could lead to robust value alignment would be a step in having them implemented. One route to value alignment is having a good person or people in charge of an intent aligned AGI, having them perform a pivotal act, and using that AGI to help design working stable value alignment. That is the best long term success scenario I see.
sweenesm30

This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6

If there's a similar work culture in AI innovation, that doesn't sound optimal for developing something faster than the U.S. when "outside the LLM" thinking might ultimately be needed to develop AGI.

Also, Xi has recently called for more innovation in AI and other tech sectors:

https://www.msn.com/en-ie/money/other/xi-jinping-admits-china-is-relatively-weak-on-innovation-and-needs-more-talent-to-dominate-the-tech-battlefield/ar-B... (read more)

sweenesm30

Thanks for the reply.

Regarding your disagreement with my point #2 - perhaps I should’ve been more precise in my wording. Let me try again, with words added in bold: “Although pain doesn't directly cause suffering, there would be no suffering if there were no such thing as pain…” What that means is you don’t need to be experiencing pain in the moment that you initiate suffering, but you do need the mental imprint of having experienced some kind of pain in your lifetime. If you have no memory of experiencing pain, then you have nothing to avert. And without ... (read more)

sweenesm10

Thank you for the post! I basically agree with what you're saying, although I myself have used the term "suffering" in an imprecise way - it often seems to be the language used in the context of utilitarianism when talking about welfare. I first learned the distinction you mention between pain and suffering during some personal development work years ago, so outside the direct field of philosophy. 

I would add a couple of things:

  1. Pain is experienced "in the moment," while suffering comes from the stories we tell ourselves and the meanings we make of thi
... (read more)
1jbkjr
Re: 2, I disagree—there will be suffering if there is craving/aversion, even in the absence of pain. Craving pleasure results in suffering just as much as aversion to pain does. Re: 4, While I agree that animals likely "live more in the moment" and have less capacity to make up stories about themselves, I do not think that this precludes them from having the basic mental reaction of craving/aversion and therefore suffering. I think the "stories" you're talking about have much more to do with ego/psyche than the "self" targeted in Buddhism—I think of ego/psyche as "the story/stories a mind tells itself about itself," whereas "self" is more about modeling some sensations as "me or mine" and other sensations as "not me or mine." I think non-human animals do not tell themselves stories about themselves to the same extent humans do, but do think they're quite capable of making the self/other distinction in the relevant sense. I think it's quite possible for craving/aversion to occur without having concocted such a story.

I basically agree with Shane's take for any AGI that isn't trying to be deceptive with some hidden goal(s). 

(Btw, I haven't seen anyone outline exactly how an AGI could gain it's own goals independently of goals given to it by humans - if anyone has ideas on this, please share. I'm not saying it won't happen, I'd just like a clear mechanism for it if someone has it. Note: I'm not talking here about instrumental goals such as power seeking.)

What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specifi... (read more)

2mic
I agree that we want more progress on specifying values and ethics for AGI. The ongoing SafeBench competition by the Center for AI Safety has a category for this problem:

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.

This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are ... (read more)

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

0Neil
Interesting! Seems like you put a lot of effort into that 9,000-word post. May I suggest you publish it in little chunks instead of one giant post? You only got 3 karma for it, so I assume that those who started reading it didn't find it worth the effort to read the whole thing. The problem is, that's not useful feedback for you, because you don't know which of those 9,000 words are presumably wrong. If I were building a version of utilitarianism, I would publish it in little bursts of 2-minute posts. You could do that right now with a single section of your original post. Clearly you have tons of ideas. Good luck! 

Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety.  My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own ... (read more)

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know -... (read more)

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp

1Maria Kapros
Wasn't aware of it. Thanks!

Thanks for adding the headings and TL;DR. 

I wouldn't say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience - perhaps you can, too, for your posts? 

When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart - it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT's help? Becoming a more clear and concise writer can be useful both for getting one's views across and crystallizing one's own thinking.

Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”

By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.

Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.

Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”

2Ustice
I kind of think of this as more than sandbox testing. There is a big difference between how a system works in laboratory conditions, and how it works when encountering the real world. There are always things that we can't foresee.  As a software engineer, I have seen system that work perfectly fine in testing, but once you add a million users, then the wheels start to fall off. I expect that AI agents will be similar. As a result, I think that it would be important to start small. Unintended consequences are the default. I would much rather have an AGI system try to solve small local problems before moving on to bigger ones that are harder to accomplish. Maybe find a way to address the affordable housing problem here. If it does well, then consider scaling up.

Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build th... (read more)

I only skimmed the work - I think it's hard to expect people to read this much without knowing if the "payoff" will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says "Paragraph" on the left - if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this. 

For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? ... (read more)

1X O
I added some headings, but I am not sure they help. I thought it would maker more sense to establish myself that would then establish how I got to a solution outside of usury. So once I am cleared to write a shorter version, I will. But, man, more than -30 karma? Because I am telling people to read instead of jumping to conclusions? With the first thing I post? I was told LessWrong has a high standard of content, so I made a point of doing that, well, thought I was, by building the big picture, the whole story, to get to why, what, how. But I don't have the best solutions. What could AI Achieve, with the framework presented, given that I know, and what I am trying to let you all know, that a neutral currency is critical for any empowered change to really happen. Let's use AI to see what it would come up with.  I didn't expect the article to be this long, but the initial, much shorter article was rejected because the moderators said I was muddled. So, OK, here I the big picture, establishing the whole argument. What will happen with the shorter article I write is I'll get questions to explain more. And so I go round in circles. That is why I left a link to what I wrote on Medium, to give other options. I am doing the best I can to explain something that is really outside what we accept as currency. BX is not for those that aren't interested sustainable change. We have to start somewhere. Who on Earth has heard of a neutral currency, ye?  If people want to disrespect innovation in the sustainable space, that is their problem. I just didn't; expect the vitriol here. That's entitled ignorance talking. Exactly who I want to avoid, but really didn't expect it here.
1X O
OK. I will work on the headings, but I just tried to edit it and it seems I can't because I have less than -2 karma. I am -22 now. So much vitriol. I wrote the shorter article and Jacob said it was muddled, so I wrote the full version to get to how I got to know it is the currency type.  BTW, I did write a comment article that sumarises this about 2-3 weeks ago. I don't think anyone read that. Feel free to search for it, if you like. I am not able to write anything now for a week, so I will do ask you suggest then. I appreciate the feedback, but wow ... -22 karma!? well, I did say this was going to be a big ride to understand WHY it is only through changing the currency type to a neutral one will humanity be able to create sustainable excellence. That is what I mean by the greater good. I explain that in the article, if anyone actually read it. It seems not. I knew this would be tough, but we got to start somewhere. Thanks so much for the suggestions. Will do once I am able to. :)

Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in. 

1X O
I wrote a shorter article in the beginning which was rejected. Hence the detail. I do not know how to make headings for this. The title says it all, no? I am trying to present the scope of my learning to get to how I got to currency as the driver. Should I take that out? Should I be completely analytical and just say the currency type is the driver, this is what it should be, and that's it? But then I get people asking for more depth, and so I write this. Is the story of how I got here uninteresting? My aim was to validate myself to readers that I am not in a box, that getting to currency, and the type, took a lot of understanding culture creation, etc. If I take that out, I will get other judgements of what I miss, etc. Isn't it more pertinent to look at the argument presented? Did anything interest you, anyway? Does the argument make sense? Should I take out my personal story of architecture? Everyone wants me to write a different way, and I do, to be told t write a different way, but isn't it more important to address the subject mater?  You are welcome to read alternative versions of my work at my Medium address https://bit.ly/3SAmlWj, if you like. What I really am looking for are people to act on what I am proposing, but it seems I have a score of -21, which seems unprecedented, and no one is understanding what I am proposing, and just missing the point.  Ah, well. I tried.

Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.

Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a... (read more)

1StartAtTheEnd
(I likely wrote too much. Don't feel pressured to read all of it) Everything this community is trying to do (like saving the world) is extremely difficult, but we try anyway, and it's sort of interesting/fun. I'm in over my head myself, I just think that psychological (rather than logical or biological) insights about morality are rare despite being important for solving the problem. I believe that you can make a system of moral values, but a mathematical formalization of it would probably be rather vulgar (and either based on human nature or constructed from absolutely nothing). Being honest about moral values is itself immoral, for the same reason that saying "Hi, I want money" at a job interview is considered rude. I belive that morality is largely aesthetic, but exposing and breaking illusions, and pointing out all the elephants in the room, just gets really ugly. The Tao Te Ching says something like "The great person doesn't know that he is virtuous, therefore he is virtuous" Why do we hate cockroaches, wasps and rats, but love butterflies and bees? They differ a little in how useful they are and some have histories of causing problems for humanity, but I think the bigger factor is that we like beautiful and cute things. Think about that, we have no empathy for bugs unless they're cute, and we call ourselves ethical? In the anime community they like to say "cute is justice", but I can't help but take this sentence literally. The punishment people face is inversely proportional to how cute they are (leading to racial and gender bias in criminal sentencing). We also like people who are beautiful (and exceptions to this is when beautiful people have ugly personalities, but that too is based on aesthetics). We consider people guilty when we know that they know what they did wrong. This makes many act less mature and intelligent than they are (Japanese derogatory colloquial word: Burikko, woman who acts cute by playing innocent and helpless. Thought of as a self-d

Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask my... (read more)

Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.

Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that ... (read more)

2mishka
Right. Basically, however one slices it, I think that the idea that superintelligent entities will subordinate their interests, values, and goals to those of unmodified humans is completely unrealistic (and trying to force it is probably quite unethical, in addition to being unrealistic). So what we need is for superintelligent entities to adequately take interests of "lesser beings" into account. So we actually need them to have a much stronger ethics compared to typical human ethics (our track record of taking interests of "lesser beings" into account is really bad; if superintelligence entities end up having ethics as defective as typical human ethics, things will not go well for us).

Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here's an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger

Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.

These questions are in part an attempt to set some kind of bar... (read more)

Thanks for the comment. If an AGI+ answered all my questions "correctly," we still wouldn't know if it were actually aligned, so I certainly wouldn't endorse giving it power. But if it answered any of my questions "incorrectly," I'd want to "send it back to the drawing board" before even considering using it as you suggest (as an "obedient tool-like AGI"). It seems to me like there'd be too much room for possible abuse or falling into the wrong hands for a tool that didn't have its own ethical guardrails onboard. But maybe I'm wrong (part of me certainly h... (read more)

It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable

Load More