GasStationManager's Shortform

GasStationManager

LESSWRONG
LW

GasStationManager's Shortform — LessWrong

8 comments, sorted by

Click to highlight new comments since: Today at 7:59 PM

Any Lean enthusiasts here? You might be interested to check out Code with Proofs: the Arena. Test your Lean skills with our coding challenges!

Code for the website is open sourced at https://github.com/GasStationManager/CodeProofTheArena

[-]Gunnar_Zarncke1y30

I guess you got downvoted because it sounded like an ad.

But I think Lean Prover is a programming language with a lot of potential for AI alignment and is mentioned in Provably safe systems: the only path to controllable AGI. It would be good to have more knowledge about it on LessWrong.

[-]GasStationManager7mo10

In this post, I show that with some tooling help, current LLM reasoning models like Claude Sonnet and Opus can do some quite non-trival tasks, including formalizing mathematical proofs as well as producing code implementations with proofs of correctness.

Proving Alpha-Beta (Episode 2), Sorry Hammer and More | Lean4AI @ Gas Station Secret Lab

[-]GasStationManager8mo10

The next installment on my quest to get AIs to produce a correct implementation of Alpha-Beta with machine-checkable proof of correctness. This time featuring GPT 4.1 and o3. Guess who lies repeatedly...

Proving Alpha-Beta, Episode 1 | Lean4AI @ Gas Station Secret Lab

[-]GasStationManager9mo10

In a recent series of posts, I explore building a workflow for coding AIs that allows them to detect and recover from their hallucinations:
Property-Based Testing with Dependent Types: Experiment Alpha-Beta | Lean4AI @ Gas Station Secret Lab
Alpha-Beta with Claude Code | Lean4AI @ Gas Station Secret Lab
Alpha-Beta with Goose + Gemini | Lean4AI @ Gas Station Secret Lab

[-]GasStationManager1y10

Experiments on Hallucination Detection and Recovery

In this initial post on Hallucination Detection and Recovery and follow-up, I explore the use of the interactive theorem prover Lean and its features to help LLMs detect and recover from hallucinations in solving coding tasks.

[-]GasStationManager1y10

I have written an essay "A Proposal for Safe and Hallucination-free Coding AI" (https://gasstationmanager.github.io/ai/2024/11/04/a-proposal.html), in which I propose an open-source collaboration on a research agenda that I believe will eventually lead to coding AIs that have superhuman-level ability, are hallucination-free, and safe.

Comments are welcome!

[-]GasStationManager7mo00

http://ProvablyCorrectVibeCoding.com

Vibe coding sometimes gets a bad rap, but I believe it reresents a fundamental desire: I have a brilliant idea, and want to turn that into something concrete that people can use. I know what I want to create but don't have the programming expertise. Can my AI assistant help implement my idea into software?

However, vibe coding often doesn't work, because current AIs hallucinate. Even with better and better AIs, vibe coding is unsafe, because a misaligned AI can easily sneak malicious code past the vibe coder.

What if, our AI assistant can produce code that is guaranteed to be safe and correct?

In this demo, you will be able to pose a coding task as a formal specification in Lean, a programming language and theorem prover. The formal specification will take the form of a signature of a function to be implemented, plus theorem statements about the function. Then, an AI coding agent will attempt to solve the task by implementing the function (in Lean) and proving the theorem statements (in Lean). Finally, you can verify whether the submitted solution is valid by sending it to the Lean proof checker. If the solution passes the proof checker, you now have runnable code that is guaranteed to be correct.

Moderation Log