Not being able to figure out what sort of thing humans would rate highly isn't an alignment failure, it's a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you're saying he would. He is right to say that attempting to directly encode the category boundaries won't work. It isn't covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying t...
I'm not quite seeing how this negates my point, help me out?
In this instance the problem the AI is optimizing for isn't "maximize smiley faces", it's "produce outputs that human raters give high scores to". And it's done well on that metric, given that the LLM isn't powerful enough to subvert the reward channel.
I'm sad that the post doesn't go on to say how to get matplotlib to do the right thing in each case!
I think matplotlib has way too many ways to do everything to be comprehensive! But I think you could do almost everything with some variants of these.
ax.spines['top'].set_visible(False) # or 'left' / 'right' / 'bottom'
ax.set_xticks([0,50,100],['0%','50%','100%'])
ax.tick_params(axis='x', left=False, right=False) # or 'y'
ax.set_ylim([0,0.30])
ax.set_ylim([0,ax.get_ylim()[1]])
I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?
The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn't sign from something you signed that the relying party didn't see.
Of course this market is "Conditioning on Nonlinear bringing a lawsuit, how likely are they to win?" which is a different question.
Extracted from a Facebook comment:
I don't think the experts are expert on this question at all. Eliezer's train of thought essentially started with "Supposing you had a really effective AI, what would follow from that?" His thinking wasn't at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn't expertise on what the results are when it's as effective as Eliezer posits. It's like thinking you shouldn't have an opinion on whether there will be a nuclear conflict over Kashmir unless you're a nuclear physicist.
Thanks, that's useful. Sad to see no Eliezer, no Nate or anyone from MIRI or having a similar perspective though :(
The lack of names on the website seems very odd.
Don't let your firm opinion get in the way of talking to people before you act. It was Elon's determination to act before talking to anyone that led to the creation of OpenAI, which seems to have sealed humanity's fate.
This is explicitly the discussion the OP asked to avoid.
This is true whether we adopt my original idea that each board member keeps what they learn from these conversations entirely to themselves, or Ben's better proposed modification that it's confidential but can be shared with the whole board.
Perhaps this is a bad idea, but it has occurred to me that if I were a board member, I would want to quite frequently have confidential conversations with randomly selected employees.
For cryptographic security, I would use HMAC with a random key. Then to reveal, you publish both the message and the key. This eg allows you to securely commit to a one character message like "Y".
I sincerely doubt very many people would propose mayonnaise!
A jar of mayonnaise would work tolerably: put it on top of a corner of whichever side of the book is tending to swing over and close.
(I agree that I would expect most humans to do better than almost all the AI responses shown here.)
The idea is that I can do all this from my browser, including writing the code.
I'm not sure I see how this resembles what I described?
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
(I'm considering putting a second cube on top to get five more filters per fan, which would also make it quieter.)
Four more filters per fan, right?
Any thoughts on this today?
Any thoughts on parking? Thanks!
I think this is diminishing marginal returns of consumption, not production.
True; in addition, places vary a lot in their freak-tolerance.
If I lived in Wyoming and wanted to go to a fetish event, I guess I'm driving to maybe Denver, around 3h40 away? I know this isn't a consideration for everyone but it's important to me.
The same is basically true for any niche interest - it will only be fulfilled where there's adequate population to justify it. In my case, particular jazz music.
Probably a lot of people have different niche interests like that, even if they can't agree on one.
Why the 6in fan rather than the 8in one? Would seem to move a lot more air for nearly the same price.
Thank you!
Reminiscent of Freeman Dyson's 2005 answer to the question: "what do you believe is true even though you cannot prove it?":
Since I am a mathematician, I give a precise answer to this question. Thanks to Kurt Gödel, we know that there are true mathematical statements that cannot be proved. But I want a little more than this. I want a statement that is true, unprovable, and simple enough to be understood by people who are not mathematicians. Here it is.
Numbers that are exact powers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Numbers th...
No sarcasm.
You're not able to directly edit it yourself?
On Twitter I linked to this saying
Basic skills of decision making under uncertainty have been sorely lacking in this crisis. Oxford University's Future of Humanity Institute is building up its Epidemic Forecasting project, and needs a project manager.
Response:
I'm honestly struggling with a polite response to this. Here in the UK, Dominic Cummings has tried a Less Wrong approach to policy making, and our death rate is terrible. This idea that a solution will somehow spring from left-field maverick thinking is actually lethal.
Can you give some examples of "LW-style thinking" that they now associate with Cummings?
I look back and say "I wish he had been right!"
Britain was in the EU, but it kept Pounds Sterling, it never adopted the Euro.
How many opportunities do you think we get to hear someone make clearly falsifiable ten-year predictions, and have them turn out to be false, and then have that person have the honour necessary to say "I was very, very wrong?" Not a lot! So any reflections you have to add on this would I think be super valuable. Thanks!
Hey, looks like you're still active on the site, would be interested to hear your reflections on these predictions ten years on - thanks!
It is, of course, third-party visible that Eliezer-2010 *says* it's going well. Anyone can say that, but not everyone does.
I note that nearly eight years later, the preimage was never revealed.
Actually, I have seen many hashed predictions, and I have never seen a preimage revealed. At this stage, if someone reveals a preimage to demonstrate a successful prediction, I will be about as impressed as if someone wins a lottery, noting the number of losing lottery tickets lying about.
Half formed thoughts towards how I think about this:
Something like Turing completeness is at work, where our intelligence gains the ability to loop in on itself, and build on its former products (eg definitions) to reach new insights. We are at the threshold of the transition to this capability, half god and half beast, so even a small change in the distance we are across that threshold makes a big difference.
As such, if you observe yourself to be in a culture that is able to reach technologically maturity, you're probably "the stupidest such culture that could get there, because if it could be done at a stupider level then it would've happened there first."
Who first observed this? I say this a lot, but I'm now not sure if I first thought of it or if I'm just quoting well-understood folklore.
May I recommend spoiler markup? Just start the line with >!
Another (minor) "Top Donor" opinion. On the MIRI issue: agree with your concerns, but continue donating, for now. I assume they're fully aware of the problem they're presenting to their donors and will address it in some fashion. If they do not might adjust next year. The hard thing is that MIRI still seems most differentiated in approach and talent org that can use funds (vs OpenAI and DeepMind and well-funded academic institutions)
I note that this is now done. As I have for so many things here. Great work team!
Spoiler space test
Rot13's content, hidden using spoiler markup:
Despite having donated to MIRI consistently for many years as a result of their highly non-replaceable and groundbreaking work in the field, I cannot in good faith do so this year given their lack of disclosure. Additionally, they already have a larger budget than any other organisation (except perhaps FHI) and a large amount of reserves.
Despite FHI producing very high quality research, GPI having a lot of promising papers in the pipeline, and both having highly qualified and value-aligned researchers, the ...
I think the Big Rationalist Lesson is "what adjustment to my circumstances am I not making because I Should Be Able To Do Without?"
Just to get things started, here's a proof for #1:
Proof by induction that the number of bicolor edges is odd iff the ends don't match. Base case: a single node has matching ends and an even number (zero) of bicolor edges. Extending with a non-bicolor edge changes neither condition, and extending with a bicolor edge changes both; in both cases the induction hypothesis is preserved.
Here's a more conceptual framing:
If we imagine blue as labelling the odd numbered segments and green as labelling the even numbered segments, it is clear that there must be an even number of segments in total. The number of gaps between segments is equal to the number of segments minus 1, so it is odd.
From what I hear, any plan for improving MIRI/CFAR space that involves the collaboration of the landlord is dead in the water; they just always say no to things, even when it's "we will cover all costs to make this lasting improvement to your building".
Of course I should have tested it before commenting! Thanks for doing so.
Also Rosie Campbell https://x.com/RosieCampbell/status/1863017727063113803