All of Bucky's Comments + Replies

Bucky139

Do dragon unbelievers accept this stance? My impression is that dragon agnosticism would often be considered almost as bad as dragon belief.

4jefftk
No one has given me a hard time about it. I say things like "I haven't looked into it" and we move on. The next time it happens I will additionally be able to link to this post.
9xpym
They don't, of course, but if you're lucky enough not to be located among the more zealous of them and be subjected to mandatory struggle sessions, their wrath will generally be pointed at more conspicuous targets. For now, at least.
Bucky6849

I’m confused as to how the fits in with UK politics. I don’t think the minority party has any kind of veto?

I guess we have the House of Lords but this doesn’t really have a veto (at least not long term) and the House of Commons and House of Lords aren’t always or even usually controlled by different factions.

3Martin Randall
Yes, the UK govt is sometimes described as "an elected dictatorship". To the extent this article's logic applies, it works almost exactly the opposite of the description given. * The winning party is determined by democracy (heavily distorted by fptp single winner constituencies). * Once elected, factions within the winning party have the ability to exert veto power in the House of Commons. The BATNA is to bring down the government and force new elections. The civil service and the judiciary also serve as checks on the executive, along with being a signatory to various international treaties. Also the UK is easy mode, with a tradition of common law rights stretching back centuries. Many differences with Iraq.
Bucky20

One extra thing to consider financially is if you have a smart meter then you can get all of your hot water and a chunk of your heating done at off peak rates. Our off peak electricity rates are about equal per kWh to gas rates.

Without this I think our system would be roughly the same cost per year as gas or slightly more, with it we save £200 per year or so I think. (This would be a very long payback time but there was a fully funded scheme we used).

If it helps anyone we are in Scotland and get average COP=2.9

2AnthonyC
I recently had a coworker in the UK tell me they can get better off-peak rates if they install a home battery system and let the utility control it. I think in general the peak/off-peak rate difference could make a significant difference to these kinds of questions, but it's very dependent on local and regional policy choices shaping energy markets.
Bucky90

In the UK there is a non-binding but generally observed rule that speed cameras allow you to drive 10% + 2mph above the speed limit(e.g. 35mph in a 30mph zone) before they activate.

This is a bit more of a fudge but better than nothing.

Answer by Bucky2-2
  1. Someone in your company gets fired by a boss you don't know/particularly like without giving any reason
  2. You are mad with the boss and want the decision overturned
  3. You have a credible, attractive BATNA (the Microsoft offer)

These 3 items seem like they would be sufficient to cause something like the Open Letter to happen.

In most cases number 3 is not present which I think is why we don't see things like this happen more often in more organisations.

None of this requires Sam to be hugely likeable or a particularly savvy political operator, just that people gener... (read more)

Bucky60

I work in equipment manufacturing for construction so can comment on excavators. Other construction equipment (loaders, dumpers) have a similar story although excavators have more gently duty cycles and require smaller batteries so make sense to electrify first. Diesel-Hydraulic Excavators are also less efficient giving more potential advantage for electric equipment.

  1. Agree that payback period is relatively low but possibly a bit longer than here - I’ve seen 3-5 years. The ruggedised batteries required for instance can be expensive.

Purchasers of new mac... (read more)

Bucky144

Something similar not involving AIs is where chess grandmasters do rating climbs with handicaps. one I know of was Aman Hambleton managing to reach 2100 Elo on chess.com when he deliberately sacrificed his Queen for a pawn on the third/fourth move of every game.

https://youtube.com/playlist?list=PLUjxDD7HNNTj4NpheA5hLAQLvEZYTkuz5

He had to complicate positions, defend strongly, refuse to trade and rely on time pressure to win.

The games weren’t quite the same as Queen odds as he got a pawn for the Queen and usually displaced the opponent’s king to f3/f6 and p... (read more)

Bucky20

Think you need to update this line too?

This is a bit less than half the rate for the CTA.

2jefftk
Fixed, thanks!
Bucky70

Is there a default direction to twist for the butt bump? The pictures all show the greeters facing in the same direction so one must have turned left and the other right! How do I know which way I should twist?

I cannot sign the assurance contract until I understand this fundamental question

4Adam Zerner
Great question and point! I just used a proprietary GPT-U that is specialized in user research. It scanned the web and determined that 81% of people are inclined to twist clockwise. I'll update the vow to clarify.
Bucky41

Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

Bucky90

So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

I don’t think this holds if you allow for p(doom) < 1. For a typical AI researcher with p(doom) ~ 0.1 and easy replacement, striking is plausibly an altruistic act and should be applauded as such.

8cousin_it
Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.
Bucky30

I haven’t tested extensively but first impression is that this is indeed the case. Would be interesting to see if Sydney is similar but I think there’s a limit on number of messages per conversation or something?

Bucky20

When you did this do you let ChatGPT play both sides or were you playing one side? I think it is much better if it gets to play both sides.

3Zachary Witten
Both
Bucky100

I tried this with chatGPT to see just how big the difference was.

ChatGPT is pretty terrible at FEN in both games (Zack and Erik). In Erik’s game it insisted on giving me a position after 13 moves even though 25 moves had happened. When I pointed this out it told me that because there were no captures or pawn moves between moves 13 and 25 the FEN stayed the same…

However it is able to give sensible continuations of >20 ply to checkmate for both positions provided you instruct it not to give commentary and to only provide the moves. The second you allow it... (read more)

2Zachary Witten
When I tried this with ChatGPT in December (noticing as you did that hewing close to raw moves was best) I don’t think it would have been able to go 29 ply deep with no illegal moves starting from so far into a game. This makes me think whatever they did to improve its math also improved its chess.
1Towards_Keeperhood
It'd be interesting to see whether it performs worse if it only plays one side and the other side is played by a human. (I'd expect so.)
1Erik Jenner
Thanks for checking in more detail! Yeah, I didn’t try to change the prompt much to get better behavior out of ChatGPT. In hindsight, that’s a pretty unfair comparison given that the Stockfish prompt here might have been tuned to Sydney quite a bit by Zack. Will add a note to the post.
Bucky20

Despite being a GPT-3 instance DALL-E appears to be able to draw an adequate " SolidGoldMagikarp" (if you allow for its usual lack of ability to spell). I tried a couple of alternative prompts without any anomalous results.

Bucky40

FWIW this matches my own experience with one as a company car pretty exactly.

(On mine (UK, right hand drive) the wipers can be activated manually by pressing the button on the end of the left stalk. This also brings up an on-screen menu for selecting constant wiping)

Bucky121

Get out of our houses before we are driven to expend effort killing them, and similarly for all the other places ants conflict with humans (stinging, eating crops, ..)

Ant mafia: "Lovely house you've got there, wouldn't it be a shame if it got all filled up with ants?"

Bucky20

I can't tell you whether this is right or not but this is what ChatGPT thinks:

"Is it possible to put a latex table in a footnote?"

"Yes, it is possible to include a LaTeX table in a footnote. You can do this by using the footnote or footnotetext commands provided by the footmisc package.

Here is an example of how to use these commands to include a table in a footnote:

 

\usepackage{footmisc}

...

\begin{table}[h]

\centering

\begin{tabular}{c c c}

A & B & C \\

1 & 2 & 3 \\

4 & 5 & 6

\end{tabular}

\caption{Table caption}

\end{table}... (read more)

3Ben
Oh, thanks for trying it. Its a good effort in a way, it is definitely it trying to put the table in the footnote. So has the right target. Unfortunately from my test it doesn't seem to actually work - no error message, but the table doesn't render. It is so much less frustrating to see it fail at the right goal than provide a lot of information about a distinct goal that happens to share the same keywords.
Bucky2310

I think the article undersells the problems of ChatGPT's hallucinations. One example from the article where ChatGPT is said to win is a recipe for risotto. However, I wouldn't follow a risotto recipe for ChatGPT just because I can't be confident it hasn't hallucinated some portion of the recipe but would happily follow one from Google, even if the format is a bit more annoying. Same issue with calculating load bearing capacity for a beam only more serious!

Having said that, it does seem like there are definitely specific areas where ChatGPT will be more use... (read more)

4AnthonyC
Recipe, no. Flavoring/seasoning combo or pairing ideas: absolutely. I've gotten some good ones for some unusual ingredients I happened to have recently. And yes, GPT-4 is noticeably better at coming up with actually tasty non-traditional recommendations.
1Bezzi
I agree. I can also confirm that ChatGPT is indeed making stuff up even in the recipe linked in the article: the traditional risotto recipe used in Italy doesn't include garlic.
Bucky163

One thing I've found useful is to make sure I identify to the supplier what specifically I need about the product I'm ordering - sometimes they have something similar in stock which meets my requirements.

Bucky50

One thing I think makes a big difference to me is whether I feel like the provider is taking a collaborative or adversarial stance.

  1. I don't usually skip ads on Youtube content but if the channel is often clickbaity/misrepresenting content then I will
  2. The printer/ink thing feels very out to get me. The alternative model of printer subscription (e.g. hp) feels alot more collaborative so I don't feel the need to ensure that every page I print is as filled with ink as possible so as to get the "best" deal.
  3. If the premium charged on foods in an amusement park/movi
... (read more)
Bucky20

For the six/man thing my first association was six pack. Obviously the prototypical image would be topless but my guess is topless images aren’t in the training set (or Dall-E is otherwise prevented from producing them)

Bucky20

I realised something a few weeks back which I feel like I should have realised a long time ago.

The size of the human brain isn’t the thing which makes us smart, rather it is an indicator that we are smart.

A trebling of brain size vs a chimp is impressive but trebling a neural network’s size doesn’t give that much of an improvement in performance.

A more sensible story is that humans started using their brains more usefully (evolutionarily speaking) so it made sense for us to devote more of our resources to bigger brains for the marginal gains that would giv... (read more)

Bucky101

Thanks for publishing this. I’ve been around the rationality community for a few years and heard TAPs mentioned positively a lot without knowing much about them. This roughly matches my best guess as to what they were but the extra detail is super useful, especially in the implementation.

Bucky51

This suggests a different question. For non-participants who are given the program which creates the data, what probability/timeframe to assign to success.

On this one I think that I would have put a high probability to be solved but would have anticipated a longer timeframe.

Bucky50

I think the resulting program has lower length (so whatever string it generates has lower KC)

I don’t think this follows - your code is shorter in python but it includes 3 new built in functions which is hidden complexity.

I do agree with the general point that KC isn’t a great measure of difficulty for humans - we are not exactly arbitrary encoders.

Bucky20

What were the noise levels on the Corsi-Rosenthal?

Bucky170

Humans are very reliable agents for tasks which humans are very reliable for.

For most of these examples (arguably all of them) if humans were not reliable at them then the tasks would not exist or would exist in a less stringent form.

Bucky20

Curious as to what the get under the desks alarm was supposed to help with and how long ago this was? I’m having trouble fitting it into my world model.

3Yair Halberstadt
It was about 15 to 20 years ago. We had no idea at the time either!
Bucky220

I see that the standard Playground Q&A prompt on OpenAI uses a similar technique (although boringly uses "Unknown" instead of "Yo be real").

I think the thing which throws people off is that when GPT-3 goes wrong it goes wrong in ways that are weird to humans.

I wondered if humans sometimes fail at riddles that GPT-3 would think of as weird. I tried a few that I thought would be promising candidates (no prompt other than the questions itself)

 

Q: If a red house is made with red bricks, a blue house is made with blue bricks, a pink house is made with ... (read more)

Bucky60

I think the natural/manmade comparison between COVID and Three Mile has alot of merit but there are other differences which might explain the difference. Some of them would imply that there would be a strong response to an AI , others less so. 

Local vs global

To prevent nuclear meltdowns you only need to ban them in the US - it doesn't matter what you do elsewhere. This is more complicated for pandemic preparedness.

Active spending vs loss of growth

Its easier to pass a law putting in nuclear regulations which limit growth as this isn't as obvious a loss... (read more)

4lc
Yeah, this is a better explanation than my post has. There were definitely multiple factors. One aspect of tractability of these sorts of coordination problems that makes it different from the tractability of problems in everyday life: I don't think people largely "expect" their government to solve pandemic preparedness. It seems like something that can't be solved, to the average voter. Whereas there's pretty much a "zero-tolerance policy" (?) on nuclear meltdowns because that seems to most people like something that should never happen. So it's not necessarily about the problem being solvable in a traditional sense, more about the tendency of the public to blame their government officials when things go wrong. I predict the instinct of the public if "something goes wrong" with AGI will be to say "this should never happen, the government needs to Do Something", which in practice will mean blaming the companies involved and completely hampering their ability to publish or complete relevant research.
Bucky20

Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume "a very moderate chance" means something like 5-10%?

Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway. 

Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.

Bucky120

There's a theory of humor called benign violation theory.

The BVT claims that humor occurs when three conditions are satisfied: 1) something threatens one's sense of how the world "ought to be", 2) the threatening situation seems benign, and 3) a person sees both interpretations at the same time.

I think your description of pranks etc. fits in nicely with this - you even chose the same words to describe it so maybe you're already aware?

7Duncan Sabien (Deactivated)
Yeah, I was hoping to catch a little resonance there.
Bucky40

It's worth noting that the while number of courses at Berkeley almost doubled in the period shown, the number of courses per student has increased at a lower rate due to an increase in students. 

Eyeballing the graph and looking at Berkeley's enrollment numbers I think the number of courses per student has increased by around 50%. Smaller but still a big effect.

Bucky50

Example:

I have a couple of positions I need to fill at my work. I’ve been off on holiday this week and it occurred to me that I should change one of the roles quite a lot and redistribute work.

I’ve had this issue for a few months and while in work I’ve been a bit overworked to actually take a step back and see this opportunity.

Bucky40

That makes me feel less bad for doing the same...

Bucky40

To a first order approximation I think of bureaucracies as status maximisers. I'll admit that status can be a bit nebulous and could be used to explain almost anything but I think a common sense understanding of status gives a good prediction in most cases.

  • Will a bureaucracy usually perform its tasks kinda adequately? Yes
  • Will a bureaucracy excel at its tasks? Not unless excellence comes with status, so almost never
  • Will a bureaucracy look to expand its remit? Yes
  • Will a bureaucracy often look like a blame minimiser? Yes (due to asymmetric justice)

For second ... (read more)

Bucky30

From a practical point of view I would expect the pull fan to better ventilate the corners of the room. On the push side the flow is more directional and I think with a push fan you're more likely to end up with turbulent flow in the corners which would noticeably slow air transfer from these regions. From this point of view it's possible that the 2 x pull configuration may actually be better than 2 x push + 2 x pull but I'm no expert.

Of course if the air speed is low then the difference will be minimal.

Bucky120

One rich dude had a whole island and set it up to have lenses on lots of parts of it, and for like a year he’d go around each day and note down the positions of the stars

You can’t just say that without a name or reference! Not that I don’t believe you - I just want to know more!

That man's name was Tycho Brahe.

Bucky20

Snow scooters!

They're a bit tricky to get the hang of and are petrifying on steep slopes but I highly recommend. Also make getting to the hill more fun.

Bucky20

Something like this is sometimes recommended in marriage courses for dealing with disagreements. The idea is to keep emotions cool and ensure people are understanding what each other are saying.

Bucky20

So there's a technical definition of edge which is your expected gain for every unit that you bet, given your own probability and the bet odds.

I agree that not clumping up the post is probably best but to make the post correct I suggest adding the underlined text into the definition in case people don't click the link.

bet such that you are trying to win a percentage of your bankroll equal to your percent edge.

A short note to start the review that the author isn’t happy with how it is communicated. I agree it could be clearer and this is the reason I’m scoring this 4 instead of 9. The actual content seems very useful to me.

AllAmericanBreakfast has already reviewed this from a theoretical point of view but I wanted to look at it from a practical standpoint.

***

To test whether the conclusions of this post were true in practice I decided to take 5 examples from the Wikipedia page on the Prisoner’s dilemma and see if they were better modeled by Stag Hunt or Schelling... (read more)

Bucky20

Yes, I agree that some symptoms are likely highly correlated. I didn't intend to rule out that possibility with that sentence - I was just trying to say how I did my math (although I'm not sure how clear I was!). The correct conclusion is in the following sentence:

So having COVID on average gives you ~0.2 persistent symptoms vs not having COVID, with presumably some people having more than one symptom.

Possibly it would be better to add the caveat "0.2 persistent symptoms of those symptoms investigated".

On the whole I agree with Raemon’s review, particularly the first paragraph.

A further thing I would want to add (which would be relatively easy to fix) is that the description and math of the Kelly criterion is misleading / wrong.

The post states that you should:

bet a percentage of your bankroll equivalent to your expected edge

However the correct rule is:

bet such that you are trying to win a percentage of your bankroll equal to your percent edge.

(emphasis added)

The 2 definitions give the same results for 1:1 bets but will give strongly diverging r... (read more)

2Jacob Falkovich
This is a useful clarification. I use "edge" normally to include both the difference in probability of winning and losing and the different payout ratios. I think this usage is intuitive: if you're betting 5:1 on rolls of a six-sided die, no one would say they have a 66.7% "edge" in guessing that a particular number will NOT come up 5/6 of the time — it's clear that the payout ratio offsets the probability ratio. Anyway, I don't want to clunk up the explanation so I just added a link to the precise formula on Wikipedia. If this essay gets selected on condition that I clarify the math, I'll make whatever edits are needed.

The post claims:

I have investigated this issue in depth and concluded that even a full scale nuclear exchange is unlikely (<1%) to cause human extinction.

This review aims to assess whether having read the post I can conclude the same.

The review is split into 3 parts:

  • Epistemic spot check
  • Examining the argument
  • Outside the argument

Epistemic spot check

Claim: There are 14,000 nuclear warheads in the world.

Assessment: True

Claim: Average warhead yield <1 Mt, probably closer to 100kt

Assessment: Probably true, possibly misleading. Values I found were:

... (read more)
Bucky30

I suppose it depends how general one is aiming to be. If by general intelligence we mean "able to do what a human can do" then no, at this point the method isn't up to that standard.

If instead we mean "able to achieve SOTA on a difficult problem which it wasn't specifically designed to deal with" then PI-MNIST seems like a reasonable starting point.

Also, from a practical standpoint PI-MNIST seems reasonable for a personal research project.

I do think D𝜋's original post felt like it was overstating it's case. From a later comment it seems like they more see... (read more)

Bucky180

I think there's a mistake which is being repeated in a few comments both here and on D𝜋's post which needs emphasizing. Below is my understanding:

D𝜋 is attempting to create a general intelligence architecture. He is using image classification as a test for this general intelligence but his architecture is not optimized specifically for image identification.

Most attempts on MNIST use what we know about images (especially the importance of location of pixels) and design an architecture based on those facts. Convolutions are an especially obvious example of... (read more)

1tailcalled
I think if one wants to test general intelligence, one should throw the algorithm at some problem that requires general intelligence. E.g. if it could reach SOTA on text prediction, that'd be impressive. But I think it would very badly fail at even approaching it, and I don't see any obvious way to improve it.
4D𝜋
Spot on. I hope your explanation will be better understood than mine. Thank you. It 'so happens' that MNIST (but not PI) can also be used for basic geometry. That is why I selected it for my exploration (easy switch between the two modes).
Bucky20

More “for Covid” vs “with Covid” from England:

https://www.bbc.co.uk/news/health-59862568

Ratio in October was 3:1 (for:with) but this has gone down to 2:1. “For” cases are rising but at a lower fractional rate than “with” cases.

Bucky20

We don’t know which patients are in the hospital because of Covid

BBC reports today (i.e. after post was published) that 3 in 10 people who are in hospital with COVID in England were admitted for something else.

https://www.bbc.co.uk/news/uk-59814032

4Zvi
Thanks. Now all we need is a historical baseline...
Load More