Yep, that "standard library" part sure seems problematic, I am not sure if an algorithm for listing primes is shorter than just the above lookup table.
Just to give an example, here is the kind of prompt I am thinking of. I am being very specific about what I want, I think there is very little room for misunderstanding about how I expect the program to behave:
Write a Python program that reads a
.olean
file (Lean v4.13.0), and outputs the names of the constants defined in the file. The program has to be standalone and only use modules from the python standard library, you cannot assume Lean to be available in the environment.
o3-mini gives pure garbage hallucination for me on this one, like it's not even close.
If your answer to question A is "a specific thing," and your answer to B is "yes, I'm very clear on what I want," then just explain it thoroughly, and you're likely to get satisfying results. Impressive examples like "rewrite this large complex thing that particular way" fall into this category.
Disagree. It sounds like by "being specific" you mean that you explain how you want the task to be done to the AI, which in my opinion can only be mildly useful.
When I am specific to an AI about what I want, I usually still get buggy results unless the solution is easy. (And asking the AI to debug is only sometimes successful, so if I want to fix it I have to put in a lot of work to understand the code the AI wrote carefully to debug it.)
I guess the reasoning for why the solution given in the post is more "valid" than this one is "something something Occam's razor" or that it is "more elegant" (whatever "elegant" means), but if someone can make a more precise argument I would be interested to hear. (In particular, in my mind Occam's razor is something to do with empiricism, while what we are doing here is pure logic, so not sure how it exactly applies?)
Unfortunately no, I don't think any contradictions can be derived from the examples given in the post if we assume -E
and [E]
unary, and E E
binary operators. Here are some example assignments for these operators that satisfy (AFAICT) the examples from the post (assuming left associativity for juxtaposition, and that the precedence of -
is lower, so that -E E
is interpreted as -(E E)
in the last example):
Definitions for [E]
:
1 => 2
2 => 4
-1 => 1/2
-2 => 1/4
1/2 => sqrt(2)
1001 => 1001
1004 => 1004
Definitions for E E
:
1 2 => 1001
1001 1 => 3
1001 1/2 => 1001
1001 4 => 25/24
1 1 => 1002
1002 1 => 1002
1002 2 => 19
4 2 => 12
4 1 => 1003
1003 2 => 20
1 1/2 => 1004
sqrt(2) 1004 => 72^1/6
1/4 1 => 1005
1005 2 => 5/4
12 1 => 1006
1006 2 => 84
Definitions for -E
:
-(1) => -1
-(2) => -2
-(1001) => 1001
I fairly quickly figured out that the grammar is something like E ::= "[]" | "[" E "]" | "-" E | "E E"
, and that eval([E]) = 2^eval(E)
(and eval(-E) = -eval(E)
), and then went down the rabbit hole of trying to come up with some f
eval(E1 E2) = f(eval(E1), eval(E2))
for juxtaposition, and thinking about whether it's left or right associative. I was also thinking that maybe it's n-ary rather than binary so that associativity does not matter.
Anyway, I think where I went wrong is that I decided that [E]
is a unary operator by itself, and did not reconsider this. So the lesson is... don't just assume the simplest possible AST structure implied by the grammar?
Pull requests. Useful to group a bunch of messy commits into a single high-level purpose and commit that to
main
. Makes your commit history easier to read.
You can also squash multiple commits without using PRs. In fact, if someone meticulously edited their commit history for a PR to be easy-to-follow and the changes in each commit are grouped based on them being some higher level logical single unit of change, squashing their commits can be actively bad, since now you are destroying the structure and making the history less readable by making a single messy commit.
With most SWEs when I try to get them to create nicer commit histories, I get pushback. Sure, not knowing the tools (git add -p
and git rebase -i
mostly tbh.) can be a barrier, but showing them nice commit histories does not motivate them to learn the tools used to create them. They don't seem to see the value in a nice commit history.[1]
Which makes me wonder: why do you advocate for putting any effort into the git history for research projects (saying that "It's more important here to apply 'Good SWE practices'"), when even 99% of SWEs don't follow good practices here? (Is looking back at the history maybe more important for research than for SWE, as you describe research code being more journal-like?)
Which could maybe be because they also don't know the tools that can extract value from a nice commit history? E.g. using git blame
or git bisect
is a much more pleasant experience with a nice history. ↩︎
Does Overleaf have such AI integration that can get "accidentally" activated, or are you using some other AI plugin?
Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn't even have to break out, we just "accidentally" hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)
Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?
What does "obscure" mean here? (If you label the above "obscure", I feel like every query I consider "non-trivial" could be labeled obscure.)
I don't think Lean is obscure, it's one of the most popular proof assistants nowadays. The whole Lean codebase should be in the AIs training corpus (in fact that's why I deliberately made sure to specify an older version, since I happen to know that the olean header changed recently.) If you have access to the codebase, and you understand the object representation, the solution is not too hard.
Here is the solution I wrote just now:[1]
(It's minified to both fit in the comment better and to make it less useful as future AI training data, hopefully causing this question to stay useful for testing AI skills.)
I wrote this after I wrote the previous comment, my expectation that this should be a not-too-hard problem was not informed by me actually attempting it, only by rough knowledge of how Lean represents objects and that they are serialized pretty much as-is. ↩︎