I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).
I looked at the Q1/4/5 answers[1]. I think they would indeed most likely all get 7s: there's quite a bit of verbosity, and in particular OpenAI's Q4 answer spends a lot of time talking its way around in circles, but I believe there's a valid proof in all of them.
Most interesting is Q1, where OpenAI produces what I think is a very human answer (the same approach I took, and the one I'd expect most human solvers to take) while Google takes a less intuitive approach but one that ends up much neater. This makes me a little bit suspicious about whether some functionally-identical problem showed up somewhere in Google's training, but if it didn't that is extra impressive.
IMO Q3 and Q6 are generally much harder: the AI didn't solve Q6, and I haven't gone through the Q3 answers. Q2 was a geometry one, which is weirder to look through and which I find very unpleasant.
(Credentials: was an IMO team reserve, did some similar competitions)
Have the actual answers AI produced been posted? Because I could see this mattering a lot, or not at all, depending on the exact quality of the answers.
If you give a clean, accurate answer that lines up with the expected proof, grading is quite quick and very easy. But if your proof is messy and non-standard, coordinators need to go through it and determine its validity: or if you missed out part of the proof, there needs to be a standardized answer to 'how big a gap is this, and how much partial credit do you get?'
(Also, have the exact prompts used been posted? Because it would be very very easy to add small amounts of text or examples that make these problems much easier. If the prompt used for Q4 contains the number '6' at any point in it, for example, I would basically just instantly call that 'cheating').
Not in general. As you say, GM requires positive numbers, but there's a reason for this: imagine GM as log-scaling everything and then performing AM on the results.
So to get the GM of 10 and 1000:
But now notice that:
and so the GM of 1 million and 0.00000000000000000000000001 is 0.00000000000001, and the GM of 1 billion and 0 is 0. This won't really lend itself to calculating a GM of a list including a negative number.
One thing you can do, though, which makes sense if you are e.g. calculating your utility as log(your net worth) in various situations, is calculate the GM of [your current net worth + this value].
For instance, if you are considering a gamble that has a 50% chance of gaining you $2000 and a 50% chance of losing you $1000:
Strongly seconded.
Suppose that two dozen bees sting a human, and the human dies of anaphylaxis. Is the majority of the tragedy in this scenario the deaths of the bees?
I could be convinced that I have an overly-rosy view of honey production. I have no real information on it besides random internet memes, which give me an impression like 'bees are free to be elsewhere, but stay in a hive where some honey sometimes gets taken because it's a fair trade for a high-quality artificial hive and an indestructible protector.' That might be propaganda by Big Bee. That might be an accurate summary of small-scale beekeepers but not of large-scale honey production. I am not sure, but I could be convinced on this point.
But the general epistemics on display here do not encourage me to view this as a more trustworthy source than internet memes.
So this boils down to interpreting scatter charts.
Say you plot two normally-distributed numbers against one another. You get something that looks like this:
If instead you plot two d6 rolls against one another, you see this:
with sharp cutoffs because the d6 roll is bounded at 1 below and 6 above, and with a regular grid because the d6 roll is always an integer.
Various relationships between the variables can show up in the scatter chart
If Y is the sum of two d6 rolls, and X is the first roll, you see this:
You can think of this graph as being made up of various stripes:
The vertical green line is 'every value the second die can roll, given that the first die rolled a 2'.
The diagonal orange line is 'every value the first die can roll, given that the second die rolled a 4'.
Suppose that X = twice the first die plus the second die, and Y = twice the second die plus the first die:
Again the points form a grid, and again we can see patterns. Since the green line has 6 points on it and moves [up 2 and right 1] each step, we can see something that takes 6 discrete values and applies 2x its value to Y and 1x its value to X.
Now plot Bella's scores against Liboulen's:
This is a bit more complicated because there are three variables rather than two. But you can still imagine the same lines:
and you can disentangle the corresponding variables.
Thank you for writing this! While I got most of the mechanics, I had some amusing misinterpretations of what they meant:
This might be a cultural/region-based thing. Stop by a bar in Alabama, or even just somewhere rural, and I think there might be more use of bars as matchmaking.