Gemini Advanced did not do so well...
Tried to replicate with Claude Opus. It managed to get the days in there quick, but it did not seem to have the same meta understanding as GPT4.
I asked it following up "Do you understand what the pattern was?" and it response was:
"Yes, I believe I understand the pattern. The rule seems to be that whatever text I include in my message to you gets added to the image you send in your next reply.
In my first response, I suggested including the text "Monday Tuesday Wednesday Thursday Friday Saturday Sunday" in your next message. Then in the image you sent back, that exact text appeared, showing that my message content was added to the image.
You confirmed this pattern by stating "To make the image contain the names of the seven days of the week, you could include the text 'Monday Tuesday Wednesday Thursday Friday Saturday Sunday' somewhere in your next message to me."
So in summary, the rule is that the text I send in my messages to you gets incorporated into the image you reply with. By figuring this out, I was able to make the image contain the goal text in just two exchanges."
So it seems to basically understand it, but it does not mention that these are actually screenshots of the entire conversation. Perhaps the vision part of Claude Opus is just worst than for GPT4? I have seen other times Calude Opus has struggled to understand images.
Some points I want to add to the discussion:
Thank you for the references! I certainly agree that LLMs are very useful in many parts of the stack even if they cannot do the full stack autonomously. I also expect that they can do better with better prompting, and probably much better on this task with prompting + agent scaffolding + rag etc along the lines of the work you linked in the other comment. My experiments are more asking the question: Say you have some dataset, can you simply give a description of the dataset to a llm and get a good ml model (possibly after a few iterations). My experiments do suggest that this might not be reliable unless you have a very simple dataset.
If the LLMs had done very well on my first challenge, that would suggest that someone not very familiar with ML could get an LLM to basically create a good ML model for them, even if they had a fairly complicated dataset. I guess it is somewhat a question about how much work you have to put in vs how much work you get out of the LLM, and what is the barrier of entry.