All of Paul Crowley's Comments + Replies

Also Rosie Campbell https://x.com/RosieCampbell/status/1863017727063113803

Not being able to figure out what sort of thing humans would rate highly isn't an alignment failure, it's a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you're saying he would. He is right to say that attempting to directly encode the category boundaries won't work. It isn't covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying t... (read more)

I'm not quite seeing how this negates my point, help me out?

  • Eliezer sometimes spoke of AIs as if they had "reward channel"
  • But they don't, instead they are something a bit like "adaption executors, not fitness maximizers"
  • This is potentially an interesting misprediction!
  • Eliezer also said that if you give the AI the goal of maximizing smiley faces, it will make tiny molecular ones
  • TurnTrout points out that if you ask an LLM if that would be a good thing to do, it says no
  • My point is that this is exactly what Eliezer would have predicted for an LLM whose reward
... (read more)
6Zack_M_Davis
I think we probably don't disagree much; I regret any miscommunication. If the intent of the great-grandparent was just to make the narrow point that an AI that wanted the user to reward it could choose to say things that would lead to it being rewarded, which is compatible with (indeed, predicts) answering the molecular smiley-face question correctly, then I agree. Treating the screenshot as evidence in the way that TurnTrout is doing requires more assumptions about the properties of LLMs in particular. I read your claims regarding "the problem the AI is optimizing for [...] given that the LLM isn't powerful enough to subvert the reward channel" as taking as given different assumptions about the properties of LLMs in particular (viz., that they're reward-optimizers) without taking into account that the person you were responding to is known to disagree.
7Martin Randall
This article does not predict that LLM behavior. Here's another quote from it: Here, the category boundary you are describing is "outputs that human raters give high scores to". That is a complex category of human values. This is squarely in both "formal fallacies" described by the article, the fallacy of "underestimating the complexity of a concept we develop for the sake of its value" and the fallacy of "anthropomorphic optimism". My reading is that, if this article is correct, then an AI trained to "produce outputs that human raters give high scores to" will instead produce out-of-distribution text that fits the category the AI learned, and not the category we wanted the AI to learn, especially when placed in novel situations. Less like Claude, more like Sydney and Bing. You apparently have the opposite reading to me. I don't see it, at all. ---------------------------------------- I think TurnTrout's point is that in order for the AI to succeed at the "magical category" pointed at by the words "outputs that human raters give high scores to", it has to also have learned the strictly easier "unnatural category" pointed at by the words "making people smile". And the results show that it has learned that.

In this instance the problem the AI is optimizing for isn't "maximize smiley faces", it's "produce outputs that human raters give high scores to". And it's done well on that metric, given that the LLM isn't powerful enough to subvert the reward channel.

4Zack_M_Davis
This isn't a productive response to TurnTrout in particular, who has written extensively about his reasons for being skeptical that contemporary AI training setups produce reward optimizers (which doesn't mean he's necessarily right, but the parent comment isn't moving the debate forward).

I'm sad that the post doesn't go on to say how to get matplotlib to do the right thing in each case!

I think matplotlib has way too many ways to do everything to be comprehensive! But I think you could do almost everything with some variants of these.

ax.spines['top'].set_visible(False) # or 'left' / 'right' / 'bottom'
ax.set_xticks([0,50,100],['0%','50%','100%'])
ax.tick_params(axis='x', left=False, right=False) # or 'y'
ax.set_ylim([0,0.30])
ax.set_ylim([0,ax.get_ylim()[1]])

I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?

4Nathan Helm-Burger
The idea is that the device would have a camera, and do ocr on the text, hash that, incorporate that into the stamp design somehow, then you'd stamp it

The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn't sign from something you signed that the relying party didn't see.

2Nathan Helm-Burger
What if the signature contains a hash of the document it was created for, so that it will not match a different document if transferred?

Of course this market is "Conditioning on Nonlinear bringing a lawsuit, how likely are they to win?" which is a different question.

Extracted from a Facebook comment:

I don't think the experts are expert on this question at all. Eliezer's train of thought essentially started with "Supposing you had a really effective AI, what would follow from that?" His thinking wasn't at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn't expertise on what the results are when it's as effective as Eliezer posits. It's like thinking you shouldn't have an opinion on whether there will be a nuclear conflict over Kashmir unless you're a nuclear physicist.

9TurnTrout
(Replying without the context I imagine to be present here) I agree with a version of this which goes "just knowing how to make SGD go brrr does not at all mean you have expertise for predicting what happens with effective AI."  I disagree with a version of this comment which means, "Having a lot of ML expertise doesn't mean you have expertise for thinking about effective AIs." Eliezer could have started off his train of thought by imagining systems which are not the kind of system which gets trained by SGD. There's no guarantee that thought experiments nominally about "effective AIs" are at all relevant to real-world effective AIs. (Example specific critique A of claims about minds-in-general, example specific critique B of attempts to use AIXI as a model of effective intelligence.)
2Viliam
Perhaps the response by experts is something like: "the only kind of AI we have are LLMs, and people who work with LLMs know that they cannot be really effective, therefore Eliezer's premises are not realistic?" Okay, it sounds stupid when I write it like this, so likely a strawman. But maybe it points in the right direction...

Thanks, that's useful. Sad to see no Eliezer, no Nate or anyone from MIRI or having a similar perspective though :(

The lack of names on the website seems very odd.

Don't let your firm opinion get in the way of talking to people before you act. It was Elon's determination to act before talking to anyone that led to the creation of OpenAI, which seems to have sealed humanity's fate.

5simeon_c
I think that this is misleading to state it that way. There were definitely dinners and discussions with people around the creation of OpenAI.  https://timelines.issarice.com/wiki/Timeline_of_OpenAI  Months before the creation of OpenAI, there was a discussion including Chris Olah, Paul Christiano, and Dario Amodei on the starting of OpenAI: "Sam Altman sets up a dinner in Menlo Park, California to talk about starting an organization to do AI research. Attendees include Greg Brockman, Dario Amodei, Chris Olah, Paul Christiano, Ilya Sutskever, and Elon Musk."

This is explicitly the discussion the OP asked to avoid.

This is true whether we adopt my original idea that each board member keeps what they learn from these conversations entirely to themselves, or Ben's better proposed modification that it's confidential but can be shared with the whole board.

Perhaps this is a bad idea, but it has occurred to me that if I were a board member, I would want to quite frequently have confidential conversations with randomly selected employees.

6Ben Pace
I suspect "confidential within the board" will be much more useful.

For cryptographic security, I would use HMAC with a random key. Then to reveal, you publish both the message and the key. This eg allows you to securely commit to a one character message like "Y".

I sincerely doubt very many people would propose mayonnaise!

A jar of mayonnaise would work tolerably: put it on top of a corner of whichever side of the book is tending to swing over and close.

(I agree that I would expect most humans to do better than almost all the AI responses shown here.)

4Gunnar_Zarncke
There are always some people who will troll you or give some fun answer - Lizardmen const. Though they will be able to answer if prodded. 

The idea is that I can do all this from my browser, including writing the code.

2ESRogs
Sounds a bit like AlphaSheets (RIP).
4Gunnar_Zarncke
That would be cool. I think it should be relatively easy to set up with replit (online IDE).

I'm not sure I see how this resembles what I described?

2Gunnar_Zarncke
Maybe I misunderstand what you have in mind? The idea is to  * enter data in a spreadsheet, * that is interpreted as row-wise input to function in a program (typically a unit test), and * the result of the function is added back into additional columns in the spreadsheet. 

I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.

2JenniferRM
Subtracting out the "web-based" part as a first class requirement, while focusing on the bridge made of code as a "middle" from which to work "outwards" towards raw inputs and final results... ...I tend to do the first ~20 data entry actions as variable constants in my code that I tweak by hand, then switch to the CSV format for the next 10^2 to 10^5 data entry tasks that my data labelers work on, based on how I think it might work best (while giving them space for positive creativity). A semi-common transitional pattern during the CSV stage involves using cloud spreadsheets (with multiple people logged in who can edit together and watch each other edit (which makes it sorta web-based, and also lets you use data labelers anywhere on the planet)) and ends with a copypasta out of the cloud and into a CSV that can be checked into git. Data entry... leads to crashes... which leads to validation code... which leads to automated tooling to correct common human errors <3 If the label team does more than ~10^4 data entry actions, and the team is still using CSV, then I feel guilty about having failed to upgrade a step in the full pipeline (including the human parts) whose path of desire calls out for an infrastructure upgrade if it is being used that much. If they get to 10^5 labeling actions with that system and those resources then upper management is confused somehow (maybe headcount maxxing instead of result maxxing?) and fixing that confusion is... complicated. This CSV growth stage is not perfect, but it is highly re-usable during exploratory sketch work on blue water projects because most of the components can be accomplished with a variety of non-trivial tools. If you know of something better for these growth stages, I'd love to hear about your workflows, my own standard methods are mostly self constructed.
2Gunnar_Zarncke
There are tools that let you do that. There is a whole unit testing paradigm called fixtures for it. A prominent example is Fitnesse: http://fitnesse.org/FitNesse.UserGuide.WritingAcceptanceTests

(I'm considering putting a second cube on top to get five more filters per fan, which would also make it quieter.)

 

Four more filters per fan, right?

3Randomized, Controlled
Think it's four on the sides + one on the top?
2beisenpress
Street parking is easy. It's just a neighborhood park.  Worst case you might have to walk a couple of blocks if all the spots right by the park are taken up. It's at the end of a cul-de-sac.

I think this is diminishing marginal returns of consumption, not production.

2Dagon
Yes, diminishing marginal utility plus increasing marginal production capability is the recipe for specialization and trade. I think I agree with Eliezer (if I read correctly) that scarcity is the underlying motive - trade is only valuable if you want something and you can trade for it more easily/cheaply than you can produce it.

I would guess a lot of us picked the term up from Donald Norman's The Design of Everyday Things.

The image of this tweet isn't present here, only on Substack.

3Aaron Bergman
Thanks very much. Just fixed that. 

True; in addition, places vary a lot in their freak-tolerance.

If I lived in Wyoming and wanted to go to a fetish event, I guess I'm driving to maybe Denver, around 3h40 away? I know this isn't a consideration for everyone but it's important to me.

The same is basically true for any niche interest - it will only be fulfilled where there's adequate population to justify it. In my case, particular jazz music.

Probably a lot of people have different niche interests like that, even if they can't agree on one.

Why the 6in fan rather than the 8in one? Would seem to move a lot more air for nearly the same price.

3Richard Korzekwa
I think I was just trying to match the CFM of my Coway purifier, since I was using the same filters. I was also worried it would be harder to properly mate a larger/heavier fan to a box. Now that I've actually built the thing, I would say the larger fan is probably better.

Reminiscent of Freeman Dyson's 2005 answer to the question: "what do you believe is true even though you cannot prove it?":

Since I am a mathematician, I give a precise answer to this question. Thanks to Kurt Gödel, we know that there are true mathematical statements that cannot be proved. But I want a little more than this. I want a statement that is true, unprovable, and simple enough to be understood by people who are not mathematicians. Here it is.
Numbers that are exact powers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Numbers th
... (read more)
2TheMajor
How does the randomness of the digits imply that the statement cannot be proven? Superficially the quote seems to use two different notions of randomness, namely "we cannot detect any patterns" (i.e. a pure random generator is the best predictor we have) and "we have shown that there can be no patterns" (i.e. we have shown no other predictor can do any better). Is this a known result from Ergodic Theory?

You're not able to directly edit it yourself?

7Ben Pace
Zvi's crossposts are a bit messy to edit (html and other things) so for Zvi we said we would make fixes to his posts when he makes updates, to reduce the cost on him for cross-posting (having to deal with two editors, especially when the LW one is messy). (I have now made the edit to the post above.)

On Twitter I linked to this saying

Basic skills of decision making under uncertainty have been sorely lacking in this crisis. Oxford University's Future of Humanity Institute is building up its Epidemic Forecasting project, and needs a project manager.

Response:

I'm honestly struggling with a polite response to this. Here in the UK, Dominic Cummings has tried a Less Wrong approach to policy making, and our death rate is terrible. This idea that a solution will somehow spring from left-field maverick thinking is actually lethal.
3[anonymous]
Did Dominic Cummings in fact try a "Less Wrong approach" to policy making? If so, how did it fail, and how can we learn from it? (if not, ignore this)
2Kaj_Sotala
Huh. Wow.

For the foreseeable future, it seems that anything I might try to say to my UK friends about anything to do with LW-style thinking is going to be met with "but Dominic Cummings". Three separate instances of this in just the last few days.

-3TAG
You mean the swine are judging ideas by how they work in practice?

Can you give some examples of "LW-style thinking" that they now associate with Cummings?

2Dagon
Seems like a good discussion could be had about long-term predictions and how much evidence there is to be had in short-term political fluctuations. The Cummings silliness vs unprecedented immigration restrictions - which is likely to have impact 5 years from now?

I look back and say "I wish he had been right!"

Britain was in the EU, but it kept Pounds Sterling, it never adopted the Euro.

How many opportunities do you think we get to hear someone make clearly falsifiable ten-year predictions, and have them turn out to be false, and then have that person have the honour necessary to say "I was very, very wrong?" Not a lot! So any reflections you have to add on this would I think be super valuable. Thanks!

Hey, looks like you're still active on the site, would be interested to hear your reflections on these predictions ten years on - thanks!

[This comment is no longer endorsed by its author]Reply

It is, of course, third-party visible that Eliezer-2010 *says* it's going well. Anyone can say that, but not everyone does.

I note that nearly eight years later, the preimage was never revealed.

Actually, I have seen many hashed predictions, and I have never seen a preimage revealed. At this stage, if someone reveals a preimage to demonstrate a successful prediction, I will be about as impressed as if someone wins a lottery, noting the number of losing lottery tickets lying about.

Half formed thoughts towards how I think about this:

Something like Turing completeness is at work, where our intelligence gains the ability to loop in on itself, and build on its former products (eg definitions) to reach new insights. We are at the threshold of the transition to this capability, half god and half beast, so even a small change in the distance we are across that threshold makes a big difference.

As such, if you observe yourself to be in a culture that is able to reach technologically maturity, you're probably "the stupidest such culture that could get there, because if it could be done at a stupider level then it would've happened there first."

Who first observed this? I say this a lot, but I'm now not sure if I first thought of it or if I'm just quoting well-understood folklore.

4Ben Pace
For me, I’m pretty sure it was Yudkowsky (but maybe Bostrom) who put it pithily enough that I remembered. Would have to look for a cite.

May I recommend spoiler markup? Just start the line with >!

Another (minor) "Top Donor" opinion. On the MIRI issue: agree with your concerns, but continue donating, for now. I assume they're fully aware of the problem they're presenting to their donors and will address it in some fashion. If they do not might adjust next year. The hard thing is that MIRI still seems most differentiated in approach and talent org that can use funds (vs OpenAI and DeepMind and well-funded academic institutions)

2Dr_Manhattan
Thanks for doing this! I couldn't figure out how.

I note that this is now done. As I have for so many things here. Great work team!

Spoiler space test

Rot13's content, hidden using spoiler markup:

Despite having donated to MIRI consistently for many years as a result of their highly non-replaceable and groundbreaking work in the field, I cannot in good faith do so this year given their lack of disclosure. Additionally, they already have a larger budget than any other organisation (except perhaps FHI) and a large amount of reserves.

Despite FHI producing very high quality research, GPI having a lot of promising papers in the pipeline, and both having highly qualified and value-aligned researchers, the ... (read more)

I think the Big Rationalist Lesson is "what adjustment to my circumstances am I not making because I Should Be Able To Do Without?"

Just to get things started, here's a proof for #1:

Proof by induction that the number of bicolor edges is odd iff the ends don't match. Base case: a single node has matching ends and an even number (zero) of bicolor edges. Extending with a non-bicolor edge changes neither condition, and extending with a bicolor edge changes both; in both cases the induction hypothesis is preserved.

Here's a more conceptual framing:

If we imagine blue as labelling the odd numbered segments and green as labelling the even numbered segments, it is clear that there must be an even number of segments in total. The number of gaps between segments is equal to the number of segments minus 1, so it is odd.

From what I hear, any plan for improving MIRI/CFAR space that involves the collaboration of the landlord is dead in the water; they just always say no to things, even when it's "we will cover all costs to make this lasting improvement to your building".

0Said Achmiz
Does MIRI/CFAR view having such a landlord as an acceptable state of affairs? Is there a plan for moving to another space, with less recalcitrant owners/renters?

Of course I should have tested it before commenting! Thanks for doing so.

Load More