LESSWRONG
LW

Paul Crowley
15255Ω559291299
Message
Dialogue
Subscribe

From London, now living in the Santa Cruz mountains.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
9Paul Crowley's Shortform
5y
11
Ten people on the inside
Paul Crowley7mo20

Also Rosie Campbell https://x.com/RosieCampbell/status/1863017727063113803

Reply
Magical Categories
Paul Crowley8mo54

Not being able to figure out what sort of thing humans would rate highly isn't an alignment failure, it's a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you're saying he would. He is right to say that attempting to directly encode the category boundaries won't work. It isn't covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying to directly encode those values, precisely so that capabilities can help with alignment. The trouble is that even pointing at this category is difficult - more difficult than pointing at "gets high ratings".

Reply
Magical Categories
Paul Crowley8mo114

I'm not quite seeing how this negates my point, help me out?

  • Eliezer sometimes spoke of AIs as if they had "reward channel"
  • But they don't, instead they are something a bit like "adaption executors, not fitness maximizers"
  • This is potentially an interesting misprediction!
  • Eliezer also said that if you give the AI the goal of maximizing smiley faces, it will make tiny molecular ones
  • TurnTrout points out that if you ask an LLM if that would be a good thing to do, it says no
  • My point is that this is exactly what Eliezer would have predicted for an LLM whose reward channel was "maximize reader scores"
  • Our LLMs tend to produce high reader scores for a reason that's not exactly "they're trying to maximize their reward channel"
  • I don't at all see how this difference makes a difference! Eliezer would always have predicted that an AI aimed at maximizing reader scores would have produced a response to TurnTrout's question that maximized reader scores, so it's silly to present them doing so as a gotcha!
Reply
Magical Categories
Paul Crowley8mo117

In this instance the problem the AI is optimizing for isn't "maximize smiley faces", it's "produce outputs that human raters give high scores to". And it's done well on that metric, given that the LLM isn't powerful enough to subvert the reward channel.

Reply
Using axis lines for good or evil
Paul Crowley1y86

I'm sad that the post doesn't go on to say how to get matplotlib to do the right thing in each case!

Reply
Nathan Helm-Burger's Shortform
Paul Crowley2y30

I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?

Reply
Nathan Helm-Burger's Shortform
Paul Crowley2y90

The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn't sign from something you signed that the relying party didn't see.

Reply
Effective Aspersions: How the Nonlinear Investigation Went Wrong
Paul Crowley2y1813

Of course this market is "Conditioning on Nonlinear bringing a lawsuit, how likely are they to win?" which is a different question.

Reply
Paul Crowley's Shortform
Paul Crowley2y70

Extracted from a Facebook comment:

I don't think the experts are expert on this question at all. Eliezer's train of thought essentially started with "Supposing you had a really effective AI, what would follow from that?" His thinking wasn't at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn't expertise on what the results are when it's as effective as Eliezer posits. It's like thinking you shouldn't have an opinion on whether there will be a nuclear conflict over Kashmir unless you're a nuclear physicist.

Reply
Campaign for AI Safety: Please join me
Paul Crowley2y20

Thanks, that's useful. Sad to see no Eliezer, no Nate or anyone from MIRI or having a similar perspective though :(

Reply
Load More
85A one-question Turing test for GPT-3
4y
25
9Paul Crowley's Shortform
5y
11
37Why no total winner?
8y
19
33Circles of discussion
9y
42
73Bill Gates: problem of strong AI with conflicting goals "very worthy of study and time"
11y
18
21Slides online from "The Future of AI: Opportunities and Challenges"
11y
10
78Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial
11y
52
30Robin Hanson's "Overcoming Bias" posts as an e-book.
11y
6
10Open thread for December 17-23, 2013
12y
306
56A diagram for a simple two-player game
12y
6
Load More
The Hanson-Yudkowsky AI-Foom Debate
8y
(+128)
The Hanson-Yudkowsky AI-Foom Debate
8y
(+7/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
8y
(+16/-19)
Squiggle Maximizer (formerly "Paperclip maximizer")
8y
(+134/-83)
R:A-Z Errata
9y
(+21/-7)
R:A-Z Errata
9y
(+582)
Holden Karnofsky
10y
(+1118)
Sequences
10y
(+2202/-2283)
The Hanson-Yudkowsky AI-Foom Debate
11y
(+336/-5)
Great Filter
11y
(+11/-13)
Load More