Writer's Shortform

Writer

LESSWRONG
LW

Writer's Shortform — LessWrong

Writer's Shortform

by Writer

14th Jan 2023

1 min read

4

This is a special post for quick takes by Writer. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

39 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:04 PM

[-]Writer1y*26-14

Surprised that there's no linkpost about Dan H's new paper on Utility Engineering. It looks super important, unless I'm missing something. LLMs are now utility maximisers? For real? We should talk about it: https://x.com/DanHendrycks/status/1889344074098057439

I feel weird about doing a link post since I mostly post updates about Rational Animations, but if no one does it, I'm going to make one eventually.

Also, please tell me if you think this isn't as important as it looks to me somehow.

EDIT: Ah! Here it is! https://www.lesswrong.com/posts/SFsifzfZotd3NLJax/utility-engineering-analyzing-and-controlling-emergent-value thanks @Matrice Jacobine!

[-]habryka1y*4628

FWIW, my sense is that it's a bad paper. I expect other people will come out with critiques in the next few days that will expand on that, but I will write something if no one has done it in a week or two. I think the paper notices some interesting weak correlations, but man, it really doesn't feel like the way you would go about answering the central question it is trying to answer and I keep having the feeling of it very much having been written to produce the thing that on the most shallow read will produce the most surface-level similar object in order to persuade and be socially viral, and not to inform.

[-]Writer1y50

Thanks! I already don't feel as impressed by the paper as I was while writing the shortform and I feel a little embarrassed for not thinking through things a little bit more before posting my reactions, although at least now there's some discussion under the linkpost so I don't entirely regret my comment if it prompted people to give their takes. I still feel to have updated in a non-negligible way from the paper though, so maybe I'm still not as pessimistic about it as other people. I'd definitely be interested in your thoughts if you find discourse is still lacking in a week or two.

[-]Writer2y*2011

For me, perhaps the biggest takeaway from Aschenbrenner's manifesto is that even if we solve alignment, we still have an incredibly thorny coordination problem between the US and China, in which each is massively incentivized to race ahead and develop military power using superintelligence, putting them both and the rest of the world at immense risk. And I wonder if, after seeing this in advance, we can sit down and solve this coordination problem in ways that lead to a better outcome with a higher chance than the "race ahead" strategy and don't risk encountering a short period of incredibly volatile geopolitical instability in which both nations develop and possibly use never-seen-before weapons of mass destruction.

Edit: although I can see how attempts at intervening in any way and raising the salience of the issue risk making the situation worse.

[-]tmeanen2y30

Plausibly one technology that arrives soon after superintelligence is powerful surveillance technology that makes enforcing commitments significantly easier than it historically has been. Leaving aside the potential for this to be misused for authoritarian government, advocating for this to be developed before powerful technologies of mass destruction may be a strategy.

[-]Writer2y72

RA has started producing shorts. Here's the first one using original animation and script: https://www.youtube.com/shorts/4xS3yykCIHU

The LW short-form feed seems like a good place for posting some of them.

[-]Writer3y77

This is infuriating somehow lol

[-]Writer2y62

Was Bing responding in Tibetan to some emojis already discussed on LW? I can't find a previous discussion about it here. I would have expected people to find this phenomenon after the SolidGoldMagikarp post, unless it's a new failure mode for some reason.

[-]Writer2y*40

Stories of AI takeover often involve some form of hacking. This seems like a pretty good reason for using (maybe relatively narrow) AI to improve software security worldwide. Luckily, the private sector should cover it in good measure for financial interests.

I also wonder if the balance of offense vs. defense favors defense here. Usually, recognizing is easier than generating, and this could apply to malicious software. We may have excellent AI antiviruses devoted to the recognizing part, while the AI attackers would have to do the generating part.

[Edit: I'm unsure about the second paragraph here. I'm feeling better about the first paragraph, especially given slow multipolar takeoff and similar, not sure about fast unipolar takeoff]

[-]quetzal_rainbow2y52

Hacking is usually not about writing malicious software, it's about finding vulnerabilities. You can avoid vulnerabilities entirely by provably safe software, but you still need safe hardware, which is tricky, and provably safe software is hell in development. It would be nice if AI companies used provably safe sandboxing, but it would require enormous coordination effort. And I feel really uneasy about training AI on finding vulnerabilities.

[-]faul_sname2y52

Also "provably safe" is a property a system can have relative to a specific threat model. Many vulnerabilities come from the engineer having an incomplete or incorrect threat model, though (most obviously the multitude of types of side-channel attack).

[-]Writer2y40

Yoshua Bengio is looking for postdocs for alignment work:

I am looking for postdocs, research engineers and research scientists who would like to join me in one form or another in figuring out AI alignment with probabilistic safety guarantees, along the lines of the research program described in my keynote (https://www.alignment-workshop.com/nola-2023) at the New Orleans December 2023 Alignment Workshop.
I am also specifically looking for a postdoc with a strong mathematical background (ideally an actual math or math+physics or math+CS degree) to take a leadership role in supervising the Mila research on probabilistic inference and GFlowNets, with applications in AI safety, system 2 deep learning, and AI for science.
Please contact me if you are interested.

[-]Writer3y40

Rational Animations has a subreddit: https://www.reddit.com/r/RationalAnimations/

I hadn't advertised it until now because I had to find someone to help moderate it.

I want people here to be among the first to join since I expect having LessWrong users early on would help foster a good epistemic culture.

[-]Writer3y40

I'm evaluating how much I should invite people from the channel to LessWrong, so I've made a market to gauge how many people would create a LessWrong account given some very aggressive publicity, so I can get a per-video upper bound. I'm not taking any unilateral action on things like that, and I'll make a LessWrong post to hear the opinions of users and mods here after I get more traders on this market.

[-]Chris_Leong3y30

I guess one thing to think about is that Less Wrong is somewhat stricter on moderation than EA, so I wonder if inviting people to the EA forum would be a more welcoming experience?

[-]Writer3y30

I was thinking about publishing the post to hear what users and mods think on the EA Forum too, since some videos would link to EA Forum posts, while others to LW posts.

I agree that moderation is less strict on the EA Forum and that users would have a more welcoming experience. On the other hand, the more stringent moderation on LessWrong makes me more optimistic about LessWrong being able to withstand a large influx of new users without degrading the culture. Recent changes by moderators, such as the rejected content section, make me more optimistic than I was in the past.

[-]Chris_Leong3y31

If you mention Less Wrong, you might want to think carefully about how to properly set expectations.

[-]Writer3y43

After reading this article by Holden and this tweet by Sam Altman I want even more to talk about the very cruxes of AI Alignment on Rational Animations. The video about the most important century, for example, is something we'll do less, and we're going straight to AI notkilleveryonism.

[-]Writer8mo30

I’m about 2/3 of the way through watching “Orb: On the Movements of the Earth.” It’s an anime about heliocentrism. It’s not the real story of the idea, but it’s not that far off, either. It has different characters and perhaps a slightly different Europe. I was somehow hesitant to start it, but it’s very good! I don’t think I’ve ever watched a series that’s as much about science as this one.

[-]Writer2y30

Here's a new RA short about AI Safety: https://www.youtube.com/shorts/4LlGJd2OhdQ

This topic might be less relevant given today's AI industry and the fast advancements in robotics. But I also see shorts as a way to cover topics that I still think constitute fairly important context, but, for some reason, it wouldn't be the most efficient use of resources to cover in long forms.

[-]Writer3y32

I've made a poll.

I'm curious to hear thoughts on this topic.

[-]JBlack3y1-2

There is not enough information to determine the answer.

To continue the thought experiment suppose that Alpha is "locked in", unable to produce any actions at all but capable of thought and sensation. The actions of such a person can be simulated faithfully very easily without simulating any of their thoughts or sensations. In more ordinary cases there is a much greater link between internal states and external actions, so perhaps it is plausible that a sufficiently accurate model of the actions might require running through the thoughts and sensations in essentially the same way that a whole brain emulation would.

We don't know whether that would be true in the real world, and in a hypothetical thought experiment that might not even conform to whatever rules reality abides by, we can't know that.

So put me down for "Not Sure", but not in the sense that the question has a definite answer that I don't know. I am very sure that the question itself is indefinite.

[-]green_leaf3y1-2

The actions of such a person can be simulated faithfully very easily without simulating any of their thoughts or sensations.

You can still talk with such a person by reading their brain state from a superpowered-fMRI-from-the-future, and them listening to your words.

(Talking to someone/interacting with someone's behavior is just a simplified way of saying "both-sided information transfer with the system," where you transmit the information to the system (in whatever way) and the system will generate the response the person is giving (also in whatever way).)

To the extent to which your thoughts and feelings are connected to your consciousness in any way, they can be elicited, either by the LLM computing your response (because they impact your words somehow), or by me asking how you feel (and LLM therefore having to figure out the answer).

To the extent to which your thoughts and feelings never influence your output in any way for any possible input, their existence is meaningless.

[-]green_leaf3y1-2

Yes. There is no other answer possible.

[-]Writer3y37

Should I pin this comment under the Sorting Pebbles video?

It's the most liked right now, but usually even the most liked comments lose visibility over time.

Use agree/disagree votes to express whether you agree or disagree with pinning it.

[-]Writer3mo2-1

I watched the first episode of Pluto (about 1 hour long), and the second part of it is entirely about a blind old pianist and his robot butler, North N.2. I liked that part a lot and wanted to share a couple of interesting things that are in it (free of important spoilers):

1. The pianist kinda hates the robot, he's rude to it, and he's convinced the robot can't "truly" sing or play piano. Everything music-wise that comes out of the robot must be soulless.

2. The robot doesn't mind the rudeness, but it's also slightly adversarial to the pianist. It has its own goal of wanting to learn the piano. Despite the pianist's request that the robot not touch the piano, it does so anyway and repeatedly asks the pianist to teach it to play. But the robot clearly has the pianist's interest at heart too. It goes out of its way to help him, both in straightforward robot-butler ways and in more nuanced, unexpected ways that require more independent agency.

3. The robot's adversarialness ends up helping the old pianist, too.

Part of why I liked the episode is the robot's alignment. It has its own thing going on, but it also has the pianist's interest at heart, and it's also possible for it to disobey the pianist, do its own thing, and for both of them to be better off anyway.

Opinions about the plausibility of achieving this type of alignment may vary, but as a thing to aim for, it seems quite decent? You get AI that cares about humans, but it's also endowed with its own independence. I don't think it's that far from what Anthropic has been trying to do lately, either. They seem to care about what their AIs desire.

The whole part is just quite beautiful for other reasons that are less relevant here, and I would definitely recommend it. I didn't like the first part of the episode that much, though, which is almost completely unrelated and has different characters.

[-]Writer2y20

Maybe obvious sci-fi idea: generative AI, but it generates human minds

[-]Writer3y20

This post by Jeffrey Ladish was a pretty motivating read: https://www.facebook.com/jeffladish/posts/pfbid02wV7ZNLLNEJyw5wokZCGv1eqan6XqCidnMTGj18mQYG1ZrnZ2zbrzH3nHLeNJPxo3l

[-]jam_brand3y20

Also posted on his shortform :) https://www.lesswrong.com/posts/fxfsc4SWKfpnDHY97/landfish-lab?commentId=jLDkgAzZSPPyQgX7i

[-]Writer3y20

I'm not sure how surprising this should actually be, but I find it of note that LessWrong remains still relatively insular despite being in the information diet of apparently many famous people and online personalities.

[-]Writer3y*10

I seriously doubt comments like these are making the situation better (https://twitter.com/Liv_Boeree/status/1637902478472630275, https://twitter.com/primalpoly/status/1637896523676811269)

Edit: on the other hand...

[-]Writer3y20

Unsurprisingly, Eliezer is better at it: https://twitter.com/ESYudkowsky/status/1638092609691488258

Still a bit dismissive, but he took the opportunity to reply to a precise object-level comment with another precise object-level comment.

[-]Writer3y10

Yann LeCun on Facebook:

I think that the magnitude of the AI alignment problem has been ridiculously overblown & our ability to solve it widely underestimated.
I've been publicly called stupid before, but never as often as by the "AI is a significant existential risk" crowd.
That's OK, I'm used to it.

[-]Writer3y23

Devastating and utter communication failure?

[-]Writer3y10

Would it be possible to use a huge model (e.g. an LLM) to interpret smaller networks, and output human-readable explanations? Is anyone working on something along these lines?

I'm aware Kayla Lewis is working on something similar (but not quite the same thing) on a small scale. In my understanding, from reading her tweets, she's using a network to predict the outputs of another network by reading its activations.

[-]Writer3y10

Is the Simulators frame essentially correct?

Agreevote to say "Yes".

Disagreevote to say "No".

[-]Writer3y10

I'm not sure, but an interesting operationalization could be "the simulators frame is correct enough that general intelligences can be simulated by LLMs".

(I decided to write this as reply rather than in the parent comment, because I don't want this to define my question above, since people might disagree about the right way to operationalize it)

[-]Writer3y10

What about "AGI X-risk" and "AGI Doom"?

[-]Writer3y10

AGI ¬Doom

[+][comment deleted]3y10

Moderation Log

More from Writer

Curated and popular this week

39Comments

39 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:04 PM

[-]Writer1y*26-14

I feel weird about doing a link post since I mostly post updates about Rational Animations, but if no one does it, I'm going to make one eventually.

Also, please tell me if you think this isn't as important as it looks to me somehow.

EDIT: Ah! Here it is! https://www.lesswrong.com/posts/SFsifzfZotd3NLJax/utility-engineering-analyzing-and-controlling-emergent-value thanks @Matrice Jacobine!

[-]habryka1y*4628

[-]Writer1y50

[-]Writer2y*2011

Edit: although I can see how attempts at intervening in any way and raising the salience of the issue risk making the situation worse.

[-]tmeanen2y30

[-]Writer2y72

RA has started producing shorts. Here's the first one using original animation and script: https://www.youtube.com/shorts/4xS3yykCIHU

The LW short-form feed seems like a good place for posting some of them.

[-]Writer3y77

This is infuriating somehow lol

[-]Writer2y62

[-]Writer2y*40

[-]quetzal_rainbow2y52

[-]faul_sname2y52

[-]Writer2y40

Yoshua Bengio is looking for postdocs for alignment work:

I am looking for postdocs, research engineers and research scientists who would like to join me in one form or another in figuring out AI alignment with probabilistic safety guarantees, along the lines of the research program described in my keynote (https://www.alignment-workshop.com/nola-2023) at the New Orleans December 2023 Alignment Workshop.
I am also specifically looking for a postdoc with a strong mathematical background (ideally an actual math or math+physics or math+CS degree) to take a leadership role in supervising the Mila research on probabilistic inference and GFlowNets, with applications in AI safety, system 2 deep learning, and AI for science.
Please contact me if you are interested.

[-]Writer3y40

Rational Animations has a subreddit: https://www.reddit.com/r/RationalAnimations/

I hadn't advertised it until now because I had to find someone to help moderate it.

I want people here to be among the first to join since I expect having LessWrong users early on would help foster a good epistemic culture.

[-]Writer3y40

[-]Chris_Leong3y30

I guess one thing to think about is that Less Wrong is somewhat stricter on moderation than EA, so I wonder if inviting people to the EA forum would be a more welcoming experience?

[-]Writer3y30

I was thinking about publishing the post to hear what users and mods think on the EA Forum too, since some videos would link to EA Forum posts, while others to LW posts.

[-]Chris_Leong3y31

If you mention Less Wrong, you might want to think carefully about how to properly set expectations.

[-]Writer3y43

[-]Writer8mo30

[-]Writer2y30

Here's a new RA short about AI Safety: https://www.youtube.com/shorts/4LlGJd2OhdQ

[-]Writer3y32

I've made a poll.

I'm curious to hear thoughts on this topic.

[-]JBlack3y1-2

There is not enough information to determine the answer.

We don't know whether that would be true in the real world, and in a hypothetical thought experiment that might not even conform to whatever rules reality abides by, we can't know that.

So put me down for "Not Sure", but not in the sense that the question has a definite answer that I don't know. I am very sure that the question itself is indefinite.

[-]green_leaf3y1-2

The actions of such a person can be simulated faithfully very easily without simulating any of their thoughts or sensations.

You can still talk with such a person by reading their brain state from a superpowered-fMRI-from-the-future, and them listening to your words.

To the extent to which your thoughts and feelings never influence your output in any way for any possible input, their existence is meaningless.

[-]green_leaf3y1-2

Yes. There is no other answer possible.

[-]Writer3y37

Should I pin this comment under the Sorting Pebbles video?

It's the most liked right now, but usually even the most liked comments lose visibility over time.

Use agree/disagree votes to express whether you agree or disagree with pinning it.

[-]Writer3mo2-1

1. The pianist kinda hates the robot, he's rude to it, and he's convinced the robot can't "truly" sing or play piano. Everything music-wise that comes out of the robot must be soulless.

3. The robot's adversarialness ends up helping the old pianist, too.

[-]Writer2y20

Maybe obvious sci-fi idea: generative AI, but it generates human minds

[-]Writer3y20

This post by Jeffrey Ladish was a pretty motivating read: https://www.facebook.com/jeffladish/posts/pfbid02wV7ZNLLNEJyw5wokZCGv1eqan6XqCidnMTGj18mQYG1ZrnZ2zbrzH3nHLeNJPxo3l

[-]jam_brand3y20

Also posted on his shortform :) https://www.lesswrong.com/posts/fxfsc4SWKfpnDHY97/landfish-lab?commentId=jLDkgAzZSPPyQgX7i

[-]Writer3y20

[-]Writer3y*10

I seriously doubt comments like these are making the situation better (https://twitter.com/Liv_Boeree/status/1637902478472630275, https://twitter.com/primalpoly/status/1637896523676811269)

Edit: on the other hand...

[-]Writer3y20

Unsurprisingly, Eliezer is better at it: https://twitter.com/ESYudkowsky/status/1638092609691488258

Still a bit dismissive, but he took the opportunity to reply to a precise object-level comment with another precise object-level comment.

[-]Writer3y10

Yann LeCun on Facebook:

I think that the magnitude of the AI alignment problem has been ridiculously overblown & our ability to solve it widely underestimated.
I've been publicly called stupid before, but never as often as by the "AI is a significant existential risk" crowd.
That's OK, I'm used to it.

[-]Writer3y23

Devastating and utter communication failure?

[-]Writer3y10

Would it be possible to use a huge model (e.g. an LLM) to interpret smaller networks, and output human-readable explanations? Is anyone working on something along these lines?

[-]Writer3y10

Is the Simulators frame essentially correct?

Agreevote to say "Yes".

Disagreevote to say "No".

[-]Writer3y10

What about "AGI X-risk" and "AGI Doom"?

[-]Writer3y10

AGI ¬Doom

[+][comment deleted]3y10

Moderation Log