Since nobody seems to have posted it yet:
Riding a motorcycle for 60 years:
(1-1/800)^60=0.928
Sailing across the ocean every month for 60 years:
(1-(1/10000))^(60*12)=0.931
The sailing risk is probably overestimated. I have never met anyone who was lost at sea, never seen pictures of someone lost at sea, never heard back from the people who I thought might be lost at sea, and I'm sure to find shore soon and I think I have enough peanut butter for another week...
Two AI accelerants I had not noticed before:
I can attest that you can fool yourself quite often (twice a week?) even if you just use CLI, frequently switch models, and almost never have context/history. When you drop something in front of an LLM they say "ok lets make it work." That's a strong signal from a fellow human. Trips something in my brain I guess. I tried adding criticism and refusals to my CLI thing but it doesn't work.
This could work. I think the hard part is finding a meaningful way to simulate the environment so that the conclusions transfer to real life.
I was at this weird party where everyone started drinking poison. I tried explaining it to them but I didn't have any proof or anything and I said "look that's poison sir" and "m'am if you think there's any chance I'm right you should stop drinking that" but no luck. They said "I'm thirsty" or "i already have this cup in my hand" or "maybe water is poison and this is antidote, you know more people die from drowning than random poisoning". I realized I was failing to think about it from their perspective. If I had known all these people for years and this big party was basically what we've been planning for a while and it was blowing up online then I would definitely drink it too.
After all the poison-is-actually-antidote guy could've also said "if there's any chance I'm right you have to drink it right now" which would be just political manuvineering clearly.
Anyways I miss those guys they were fun
With controlling a theoretical rl agent, what's the problem with asking the ai to be 99% sure that it mopped 99% of the floor and stop?
I remember that if you just ask for 99% floormop then agent will spend forever getting 99.99999% sure that at least 99% is mopped, but I can't remember the problem with this little patch.
This resolved yes!
Ok it does seem like an example then. Thank you for spelling it out.
Having some vague thoughts about "evil people". In the movies the heroes love everything and fight to save it, and the villains hate everything and fight to destroy it. I feel like in real life, the heroes love power and control, they want the world to be a certain way, and that happens to be a good world for some others too, and they happen to have good ideas about how to do it (eg USA founding fathers, Xi Jinping) instead of stupid bad ideas (Mao, Stalin). There seems to be an innate human drive for revenge and for genocide of other ethnicities, but besides that, humans don't seem to come packaged with anything that makes us crave death and destruction.
I thought about this because I was thinking about my past choices to avoid psychopaths in my personal life. It is certainly the safe choice to avoid them. But maybe I should put more thought into their preferences and ideas, instead of stopping analysis once psychopathy is obvious.
Like if you are friends with both Mao and Xi Jinping then you shouldn't just think "man those guys are real politicians". You should look closely at what they want and whether their ideas will succeed or fail. Probably obvious to many people, but not me.
Great post! Nice to see something constructive! And half your citations are new to me. Thank you for sharing.
I have spent the last few months reinventing the wheel with LLM applications in various ways. I've been using my own code assistant for about 7 months. I did an alpha-evolve-style system for generating RL code that learns atari. Last year I was trying some fancy retrieval over published/patented electronics circuits. Did some activation steering and tried my hand at KellerJordan/modded-nanogpt for a day. Of course before that I was at METR helping them set evals stuff up.
It hasn't occurred to me to try to draw any conclusions from all this different work, and I didn't think of it really as inter-related in any significant way or relevant experience for much of anything, but your topic here is making me think...
Almost every "optimizing" system I make ends up breaking/glitching/cheating the score function. Then I patch and patch until it works, and by then it looks more like a satisficer.
Getting something really useful seems to take about a month of corrections like this. It looks done/working on the first day, I notice something broken and fix it and declare it done on the second day, etc, but after a month I just don't have any more corrections to make. This is different from eg a web app or game which I never run out of todo items for. Of course when LLMs are involved you have to look three times more carefully to be sure you are measuring what you mean to be measuring.
My point is that I expect projects fitting your description here to basically actually work and be worthwhile, but if it is your (speaking to the anonymous reader) first time doing this, expect that you'll spend 10x as long correcting/improving/balancing scores & heuristics as you'll spend on the core functionality.
As you stated in the post, that's not so different from the process used to make AI assistants (etc) in general.
Making my own AI tools has definitely given some depth/detail to all the theoretical problems I've been reading about and talking about all these years. Particularly it is impressive how long my tools have tricked me at times. It is possible I am still tricked right now.