With the right prompt, I get the following results for a few examples I tried (first attempts, no cherry-picking).
Input: ( ) ( ( ) )
Output: Balanced
Input: ( ) ( ( )
Output: Unbalanced
Input: ) (
Output: Unbalanced
Input: ( ) ( ) ( )
Output: Balanced
So it is definitely able to learn balancing a small number of parentheses.
I'm convinced that humans must spike their blood sugar and/or pump their body full of stimulants such as caffeine in order to get past the natural tendency to find it unbearably dull to memorize words and syntax by rote and lifeless connection with the structures in their native language.
Just a comment: This is certainly not true for every human. Some people really enjoy that.
So, first you have the utility functions that pay both agents 10 if they cooperate and 1 if they don’t.
Then you change the utility functions to pay the agents 0 if they cooperate and 1 if they don’t. Naturally they will then stop cooperating.
I don’t get it. If you are the one specifying the utility functions, then obviously you can make them cooperate or defect, right?
Sometimes the environment really is adversarial, though.
Regular expressions as implemented in many programming languages work fine for almost all inputs, but with terrible upper bounds. This is why libraries like re2 are needed if you want to let anyone on the Internet use regular expressions in your search engine.
Another example that comes to mind is that quicksort runs in n·log n expected time if randomly sorted beforehand.
Thanks, that makes sense given your assumptions and results.