Not being able to figure out what sort of thing humans would rate highly isn't an alignment failure, it's a capabilities failure, and Eliezer_2008 would never have assumed a capabilities failure in the way you're saying he would. He is right to say that attempting to directly encode the category boundaries won't work. It isn't covered in this blog post, but his main proposal for alignment was always that as far as possible, you want the AI to do the work of using its capabilities to figure out what it means to optimize for human values rather than trying t...
I'm not quite seeing how this negates my point, help me out?
I think matplotlib has way too many ways to do everything to be comprehensive! But I think you could do almost everything with some variants of these.
ax.spines['top'].set_visible(False) # or 'left' / 'right' / 'bottom'
ax.set_xticks([0,50,100],['0%','50%','100%'])
ax.tick_params(axis='x', left=False, right=False) # or 'y'
ax.set_ylim([0,0.30])
ax.set_ylim([0,ax.get_ylim()[1]])
Extracted from a Facebook comment:
I don't think the experts are expert on this question at all. Eliezer's train of thought essentially started with "Supposing you had a really effective AI, what would follow from that?" His thinking wasn't at all predicated on any particular way you might build a really effective AI, and knowing a lot about how to build AI isn't expertise on what the results are when it's as effective as Eliezer posits. It's like thinking you shouldn't have an opinion on whether there will be a nuclear conflict over Kashmir unless you're a nuclear physicist.
Reminiscent of Freeman Dyson's 2005 answer to the question: "what do you believe is true even though you cannot prove it?":
Since I am a mathematician, I give a precise answer to this question. Thanks to Kurt Gödel, we know that there are true mathematical statements that cannot be proved. But I want a little more than this. I want a statement that is true, unprovable, and simple enough to be understood by people who are not mathematicians. Here it is.
Numbers that are exact powers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Numbers th...
On Twitter I linked to this saying
Basic skills of decision making under uncertainty have been sorely lacking in this crisis. Oxford University's Future of Humanity Institute is building up its Epidemic Forecasting project, and needs a project manager.
Response:
I'm honestly struggling with a polite response to this. Here in the UK, Dominic Cummings has tried a Less Wrong approach to policy making, and our death rate is terrible. This idea that a solution will somehow spring from left-field maverick thinking is actually lethal.
How many opportunities do you think we get to hear someone make clearly falsifiable ten-year predictions, and have them turn out to be false, and then have that person have the honour necessary to say "I was very, very wrong?" Not a lot! So any reflections you have to add on this would I think be super valuable. Thanks!
I note that nearly eight years later, the preimage was never revealed.
Actually, I have seen many hashed predictions, and I have never seen a preimage revealed. At this stage, if someone reveals a preimage to demonstrate a successful prediction, I will be about as impressed as if someone wins a lottery, noting the number of losing lottery tickets lying about.
Half formed thoughts towards how I think about this:
Something like Turing completeness is at work, where our intelligence gains the ability to loop in on itself, and build on its former products (eg definitions) to reach new insights. We are at the threshold of the transition to this capability, half god and half beast, so even a small change in the distance we are across that threshold makes a big difference.
As such, if you observe yourself to be in a culture that is able to reach technologically maturity, you're probably "the stupidest such culture that could get there, because if it could be done at a stupider level then it would've happened there first."
Who first observed this? I say this a lot, but I'm now not sure if I first thought of it or if I'm just quoting well-understood folklore.
May I recommend spoiler markup? Just start the line with >!
Another (minor) "Top Donor" opinion. On the MIRI issue: agree with your concerns, but continue donating, for now. I assume they're fully aware of the problem they're presenting to their donors and will address it in some fashion. If they do not might adjust next year. The hard thing is that MIRI still seems most differentiated in approach and talent org that can use funds (vs OpenAI and DeepMind and well-funded academic institutions)
Rot13's content, hidden using spoiler markup:
Despite having donated to MIRI consistently for many years as a result of their highly non-replaceable and groundbreaking work in the field, I cannot in good faith do so this year given their lack of disclosure. Additionally, they already have a larger budget than any other organisation (except perhaps FHI) and a large amount of reserves.
Despite FHI producing very high quality research, GPI having a lot of promising papers in the pipeline, and both having highly qualified and value-aligned researchers, the ...
Just to get things started, here's a proof for #1:
Proof by induction that the number of bicolor edges is odd iff the ends don't match. Base case: a single node has matching ends and an even number (zero) of bicolor edges. Extending with a non-bicolor edge changes neither condition, and extending with a bicolor edge changes both; in both cases the induction hypothesis is preserved.
Here's a more conceptual framing:
If we imagine blue as labelling the odd numbered segments and green as labelling the even numbered segments, it is clear that there must be an even number of segments in total. The number of gaps between segments is equal to the number of segments minus 1, so it is odd.
Also Rosie Campbell https://x.com/RosieCampbell/status/1863017727063113803