jr - LessWrong

AGI Safety & Alignment @ Google DeepMind is hiring

jr22d10

Thanks again for sharing this opportunity Rohin! Do you know when applicants can expect to hear back about interviews?

jr25d30

We’ve increased the weight of chickens by about 40 standard deviations relative to their wild ancestors, the red junglefowl. That’s the equivalent of making a human being that is 14 feet tall

I realize this is a very trivial matter on a very interesting post, and I don't like making a habit of nitpicking. But this feels interesting for some reason. Perhaps it's just because of the disturbing chicken visuals, I don't know.

To my credit, I actually made an effort to figure out the author reached their conclusion, and I believe I did. The average adult male is 69" tall, the std dev is 2.5", so 40*2.5" + 69" ≈ 14 feet. Still, it felt intuitively like an incorrect conclusion, I assumed for reasons related to comparing 3d vs 1d metrics. So I asked ChatGPT if the conclusion was correct (to confirm my intuition and perhaps get an explanation why). I'm guessing its assessment of the general nature of the error was correct, but things started going south quickly once it started freestyling with the logic and math.

The good

However, this is incorrect because height and weight scale differently.
Height follows roughly linear scaling.
Weight follows cubic scaling (since volume increases with the cube of height).

The bad

If we assume an average human weighs 70 kg (154 lbs) and apply a 40σ increase, their weight would be unrecognizably large—potentially thousands of kilograms.

Interestingly, I then asked it what adult human weight would be after a 40σ increase, and it correctly calculated it as 670kg (~1450 lbs).

Then I asked it whether its responses were consistent, and that's when it started to get really... creative.

Does human (mis)alignment pose a significant and imminent existential threat?

jr1mo20

TL;DR - I just recalled Google Jigsaw, which might be one effort to address my concerns. I would love to hear your thoughts on it if you are familiar.

(Read on below if you prefer the nuance)

As I was just rereading and making minor edits to my comment, and considering whether I needed to substantiate my statement about social divisions, I recalled that I had discovered Google's Project Jigsaw in March 2024, which seemed to be working on various smaller-scale initiatives intended to address these concerns using technology. When I checked it out again just now, I see they shifted their focus over the summer, which seems to be another positive step toward addressing my concerns. Particularly this:

Over the past year, Jigsaw has been exploring how to make large-scale online conversations, particularly online deliberations, more impactful and scalable, and to facilitate their use in a wider array of contexts.

Working with that team would be as close as I could imagine to a dream job, ^[1] and I believe I might be able to bring significant value. If you know anything about it, I'd love to hear your take on the work they are doing, and whether/how it might relate to the current discussion. Thanks!

^{^}
It occurs to me I said something similar to Rohin Shah recently, which was completely sincere, but honestly Jigsaw is likely an even stronger mutual match. When I reflect on how Jigsaw could have possibly fallen off my radar, I have to admit it has been an extremely stressful year, there had been no open positions, and it appeared all positions were on-site in New York (a difficult proposition for my wife and kids), so I had forced myself to relegate that to perhaps a longer-term goal.

Does human (mis)alignment pose a significant and imminent existential threat?

jr1mo*10

Thanks so much for your thoughts Dave.

I agree humans have always been misaligned, and that in many ways we have made significant advancements in alignment over long time frames. However, I think few would deny that any metric approximating alignment would be quite volatile over shorter time frames or specific populations, which creates periods of greater risk.

I agree that there must be something new that increases that existential risk to justify significant concern. You identified bioweapons as one example, which I agree is a risk, but not the specific one I am concerned about.

The new factors I am concerned about are:

The vastly increased ease and ability of small groups of misaligned actors to significantly alter, manipulate, or undermine large numbers of other humans' capacities for alignment. This seems largely tied to social media. As evidence, I would point to the sharp increase in social divisions in the US in recent years.
The introduction of AI that allows individuals to project their misaligned will and power without having to involve or persuade other individuals who previously would have exerted some degree of influence toward realignment

It seems to be putting the cart before the horse to be spending so much time, money, effort, and thought on AI Alignment, while our alignment as humans is so poor. In my mind, understanding the nature and roots of our misalignment, and identifying how to use technology to increase our alignment rather than undermine it, seems to me to be an obvious prerequisite (or co-requisite, at least) to being able to trust ourselves to use powerful AI in ways that don't decrease alignment. While recent years may have presented conditions that were especially effective at exploiting vulnerabilities in our capacities for maintaining alignment, those vulnerabilities have always been and always will be a risk, so we will always be the weakest link in the Alignment equation until we put serious effort into elevating ourselves to the same standards we expect to hold AI to.

Just to clarify, I am not at all suggesting putting less effort into AI alignment. Just proposing that perhaps putting more effort into human alignment would be wise, and likely mutually beneficial in conjunction with AI Alignment efforts. ^[1]

This is one of the arguments in favor of the AGI project.

Could you please explain (or point me to) the specific argument in favor of the AGI project that you had in mind here, so I don't risk making incorrect assumptions? I apologize I'm not as familiar with other perspectives as I'd like to be yet. Also, I'd love to hear your take on my additional thoughts.

Thank you for engaging, I find the dialogue very helpful.

^{^}
I acknowledge there are likely efforts to improve human alignment that I am unaware of, so my intuitive assessment of a deficit may be inaccurate.

Open Thread Winter 2024/2025

jr1mo30

Thanks for that info, this was one of my questions too. One follow-up: this causes me some cognitive dissonance and uncertainty. In my understanding, Quick Takes are generally for rougher or exploratory thoughts, and Posts are for more relatively well-thought out and polished writing. While Questions are by nature exploratory, they would be displayed in the Posts list. In my current understanding, the appropriate mechanism for sharing a thought, in order of increasing confidence of merit and contextual value, is:
1. Don't share it yet (value too unclear, further consideration appropriate)
2. Comment
3. Quick Take
4. Post

I realize I could informally ask a question in a comment (as I am now) or quick take, but there is an explicit mechanism for Questions. So, I'd like to use that, but I also want to ensure any Questions I create not only have sufficient merit, but are also of sufficient value to the community. Rather than explicitly ask you for guidance, it seems reasonable to me to give it a try slowly, starting with a question that seems both significant and relevant, and see whether it elicits community engagement. If you have anything to add though, that's welcome.

Thanks again for the helpful info.

(Just an awareness: I think I resolved the dissonance, if not all uncertainty. The dissonance was around my greater reluctance to create a Post than a Question, despite them being grouped together. And I figured out that identifying consequential questions is something I feel is a personal strength, and is more likely to be of value to the community and also offer the means of understanding their perspectives and their important implications.)

Open Thread Winter 2024/2025

jr1mo20

I am new here and very excited to have discovered this community, as I rarely encounter anyone else who strongly embodies the qualities and values that I share with you. I am eager to engage, and expect that it will be mutually beneficial.

At the same time, I have great admiration and respect for the people here and the community you have created, so I would like to ensure the impact of the ways I am engaging match my intent. Ideally I would love to fully absorb all the perspectives offered here already, before attempting to offer bits of my own. However, that's just not practical, so I will do my best to engage in ways that are mindful of the limits of my knowledge and contribute positively to discussions and to the community. If (and when) I fall short of those goals, I would really appreciate and welcome your feedback.

To that end, I have several questions already, which may or may not require a little back and forth. Is this the best place to ask those, or should I use intercom instead, or perhaps even discord? (I really appreciate the very helpful welcome guide and for this thread, and will try to avoid asking questions that it already answers. There's an awful lot to process though.)

Thanks so much for your assistance, and your dedication to maintaining this community.

AGI Safety & Alignment @ Google DeepMind is hiring

jr1mo10

Thank you for posting those here Rohin. Your work aligns with my values, passions, and strengths more than anything else I’m aware of, and I would love to be a part of it.

Would you be willing to help me identify whether and how I might be able to position myself as an attractive candidate for either of these roles? Or if that’s not a realistic immediate possibility, perhaps recommend an alternative role I might be a stronger candidate for which would allow me to support the same objectives?

I regret having to ask for help, especially without having established trust and credibility within this community already. However, if you’re feeling generous, I would really appreciate your insights.

LOVE in a simbox is all you need

jr1mo*10

Very informative, lots I can agree with, and plenty to chew on. I love what I see as your intentions and values on a human level, and how those affect your approach, which we seem to be fundamentally “aligned” on.

During my initial read, I noted some differences in our perspectives — not disagreements per se, likely just a product of the inherent incompleteness of our respective “training environments”. Your perspective certainly helped fill in many gaps in my perspective. And while I don’t want to be presumptuous, my intuition is that our perspectives are potentially complementary, so a discussion could be fruitful. I’ll start with the item that seems to be of greatest consequence, at least potentially.

I think it would be in the right ballpark to say that your overarching objective is to ensure that AGI (and especially super-human variants) is developed, deployed, and managed in a way that is safe, responsible, and ultimately increases altruism. And that the primary risk that concerns you is that a malign AGI would be released into the world that cannot be controlled, whether due to carelessness or malign purposes. Which I believe is a worthy objective and a valid concern.

In my mind, there is another, but far greater existential threat which I am concerned about, and believe it poses immediate threats that will only grow, perhaps exponentially, as we approach AGI and beyond. Please hear me out. I’m not interested in being defeatist or alarmist. Like you, I am only interested in pragmatic solutions, and do have some potential ideas to that end.

Keeping that in mind, I believe the far greater threat is humans — specifically, the degree to which we are not “aligned” as you use that word. Obviously, there are those who are consciously and intentionally malicious, and that’s always a concern. However, that’s not my primary concern. It has become abundantly clear to me in recent years the gravest danger is posed by those who are unwittingly misaligned, or can be manipulated into being so, especially in large numbers. And actually, being more even-handed and accurate, this misalignment is something in which we all share blame and responsibility to some extent, and repairing it will require collective action from all of us.

Even at current levels of capability, AI can be used by humans for malicious purposes. And I am concerned about the impacts that will have before even getting close to AGI, and that those could severely jeopardize your ability to achieve your objectives.

Additionally, in my mind, it doesn’t matter how pure and noble the AI agents are that you develop, even if you somehow beat everyone else to the finish line. Because as long as humans remain as misaligned as we currently are, we will be too divided to protect ourselves, and the worst of us will almost certainly infiltrate and corrupt the AGI agent population for their malign purposes. So the fundamental threat we need to address urgently and before AGI is reducing the degree of misalignment among ourselves as human beings.

For the record, I do have some ideas about how we might achieve that. And at least one relatively concrete way that could help harness the power of AI for that purpose, that may at least be worth harvesting for parts if nothing else.

I’m open to discussing any of this further, but for now, I’d really love to hear your thoughts, and would consider that an honor.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments