All of metacoolus's Comments + Replies

That's an interesting perspective. This suggests that rather than creating research teams that are doing the same thing together, it would be better to encourage a wider diversity of approaches in attempts to get different intuition. 

This might raise the question of what’s more valuable to the field - researchers with the ability to expand upon specific, targeted direction or researchers with new ideas that open up additional areas? 

Perhaps part of the solution is recognizing that both attributes can be mutually beneficial in different ways. Whil... (read more)

This discussion sheds light on an important consideration in AI: the loss or mutation of agency when aggregating systems or agents. It’s vital to remember that optimizers aren’t always constructed to fit alongside other optimizers neatly, and harmony can involve sacrifices. While John’s point about weak efficiency is well noted, finding a balance between descriptive and prescriptive models is essential. I appreciate the counterargument’s reasoning that capable aggregates of optimizers don't pass up certain gains. Energy is a zero-sum currency. When efficiencies are revealed, smart agents will find ways to fight for them.

That’s a great perspective. Do you think there's some potential for applying the skills, logic, and values of the rationalist community to issues surrounding prison reform and helping predict better outcomes? While data analysis is currently applied to predicting recidivism, could models be further calibrated or improved using data-driven approaches often employed by rationalist and AI communities? The idea is to incorporate ideas like trust-building, safe transition, and prosociality.

2PoignardAzur
Ha! Of course not. Well, no, the honest answer would be "I don't know, I don't have any personal experience in that domain". But the problems I have cited (lack of budget, the general population actively wanting conditions not to improve) can't be fixed with better data analysis. From anecdotes I've had from civil servants, directors love new data analysis tools, because they promise to improve outcomes without a budget raise. Staff hates new data analysis tools because they represent more work for them without a budget raise, and they desperately want the budget raise. I mean, yeah, rationality and thinking hard about things always helps on the margin, but it doesn't compensate for a lack of budget or political goodwill. The secret ingredients to make a reform work are money and time.

Yes! This is an excellent approach. Rather than focusing only on whether there is malicious intent, keeping in mind the more practical goal of wanting bad behavior to *stop* and seeking to understand how it might play out over time is a much more effective way of resolving the problem. Using direct communication to try and fix the situation or ascertain a history of established negligent or malicious behavior is very powerful.

Oh this is a great complication—you highlight why mental moves, like “reflection,” can lead to potential loopholes and complications. Regardless of whether it's a necessary or less central part of research, as you suggest, self-modifying goal-finding is always a potential issue in AI alignment. I appreciate the notion of “noticeable lack.” This kind of thinking pushes us to take stock of how and whether AIs actually are doing useful alignment research with benign seeming training setups. 

Is it *noticeably* lacking or clearing an expected bar? This nua... (read more)

Your concern is certainly valid - blindly assuming taking action to be beneficial misses the mark. It's often far better to refrain from embracing disruptive technologies simply to appear progressive. Thinking of ways to ensure people will not promote AI for the sole sake of causing agent overhang is indeed crucial for reducing potential existential threats. Fearlessly rejecting risky technologies is often better than blindly accepting them. With that mindset, encouraging users to explore AutoGPT and other agent-based systems is potentially problematic. Instead, focusing on developing strategies for limiting the potentially dangerous aspects of such creations should take center stage.

Your exploration of the neural circuitry concerned with modeling human emotions is a fascinating direction, especially in the context of AGI and LLMs potentially behaving deceptively. It raises the question: how might the identification and modification of these emotional circuits affect the overall performance of AI models, given that their deceptive abilities might be an emergent property of their training? Additionally, if we were to inhibit such circuitry, would there be unintended consequences in other aspects of the AI's cognition or alignment? It's exciting to ponder these questions as we delve deeper into alignment research!

Your comment about using humor as a way to navigate delicate meta-conversations is thought-provoking. It's fascinating how confidence and bluntness can often help accomplish one's goals in social situations, and it can indeed be a useful rationality tool. However, the challenge seems to strike a balance between being assertive and avoiding causing harm. Do you think this approach may sometimes risk pushing people into defensive modes or obscuring important underlying issues? How would you determine when this method is most effective?

3Οἰφαισλής Τύραννος
Consider that I can only talk from my experience.  It is a risky move, for sure, and you are going to piss off some people. But I have found out that said pissed off people are almost always inclined not only to forgive you if you make the smallest gesture of peace, but to befriend you and to appreciate you. I have found out that people really appreciate honesty, and acting this way comes of as idealistic and honest. Whereas I have found out that you can't really recover in practice from being seen as pathetic, whiny or weak (only a long time and a miracle can make you recover from that). And I believe that most delicate approaches are perceived as low status and coming from frailty. In my experience, women are way more unforgiving towards weakness and more lenient towards assholeness. With men, you will need to concede and lose from time to time, I strongly advise against "wining" too much against men, you need to let them take some jabs even if you have thought the perfect answer and you can always come on top.  The worst of this approach will be felt when the other person is depressed, very insecure and places himself at the lowest echelon of the hierarchy, but don't accept said position and deludes himself into thinking that he is much better than what he is. I say: avoid as a general rule that kind of people, and this is a good test to detect them; they are usually vulnerable narcissists, or something very similar to that, and they can't take the slightness jab without feeling injured and vengeful. If you feel deeply hurt and resented with any kind of negative feedback, learn how to sincerely laugh at yourself. The more secure and healthy the people around you, the better they will receive this approach. Low status people who don't delude themselves will also look up to you. Particularly timid people who would like to act like you, but they don't find the courage to do so.  You must also come up as fundamentally good. If you are seen like ultimately evil and

I appreciate the detailed taxonomy in this post, and it's an insightful way to analyze the gaps between understanding and implementing decision theory. However, I believe it would be even more compelling to explore how AI-based cognitive augmentation could help humans bridge these gaps and better navigate decision processes. Additionally, it would be interesting to examine the potential of GPT-style models to gain insight into AGI and its alignment with human values. Overall, great read!

2Max H
Thanks! Glad at least one person read it; this post set a new personal record for low engagement, haha.  I think exploring ways that AIs and / or humans (through augmentation, neuroscience, etc.) could implement decision theories more faithfully is an interesting idea. I chose not to focus directly on AI in this post, since I think LW, and my own writing specifically, has been kinda saturated with AI content lately. And I wanted to keep this shorter and lighter in the (apparently doomed) hope that more people would read it.  

I appreciate the clarity and thoroughness of this post. It's a useful distillation of communication norms that nurture understanding and truth-seeking. As someone who has sometimes struggled with getting my point across effectively, these guidelines serve as a solid reminder to stay grounded in both purpose and goodwill. It's reassuring to know that there's an ever-evolving community working towards refining the art of conversation for collective growth.