I sometimes thought (half jokingly) about whether text to image generative models could replace digital cameras, like how digital cameras replaced film. At least for things like holiday photos and selfies. It is certainly already used to augment such images. It would be an improvement in that one can have idealized images of themselves which capture their emotions and feelings rather than literally quantized photons. Like a painter using artistic license.
Then one could focus on enjoying the activity more and later distill and preserve it in a generated image.
Would that cultivate too many "idealized" memories though? Is that necessarily good? What other downsides could there be? Do our memories of leisurely moments necessarily need to be accurate or is it better they are just conducive to a good life?
Another alternative to "text to image" models, would be "video to image", where a wearable camera continuously captures the activity, and then generates a single image at the end to capture the emotion and essence of the activity, thus saving us some time by being able evoke the memory and feelings from a single image rather than so many cluttered albums and videos buried in a smartphone.