Very interesting work. One question I've had about this is whether humans can do such planning 'natively', i.e. in our heads, or if we're using tools in ways that are essentially the same as doing "model-based planning inefficiently, with... bottleneck being a potential need to encode intermediate states."
I recently spent a few months thinking about whether LLM-based models can do model-based planning, and wrote a ~40-page report on it: "Report on LLMs and model-based planning". The doc is a bit rough around the edges still - most notably, the concepts of "efficient planning" and "sufficiently convoluted" tasks in section 1 are incompletely defined - but I thought I would share it in the current form, in case others could find the framework or early conclusions useful.
The summary is as follows: