I often want to include an image in my posts to give a sense of a situation. A photo communicates the most, but sometimes that's too much: some participants would rather remain anonymous. A friend suggested running pictures through an AI model to convert them into a Studio Ghibli-style cartoon, as was briefly a fad a few months ago:

House Party Dances

Letting Kids Be Outside

The model is making quite large changes, aside from just converting to a cartoon, including:

  • Moving people around
  • Changing posture
  • Substituting clothing
  • Combining multiple people into one
  • Changing races
  • Giving people extra hands

For my purposes, however, this is helpful, since I'm trying to illustrate the general feeling of the situation and an overly faithful cartoon could communicate identity too well.

I know that many of my friends are strongly opposed to AI-generated art, primarily for its effect on human artists. While I have mixed thoughts that I may try to write up at some point, I think this sort of usage isn't much of a grey area: I would previously have just left off the image. There isn't really a situation where I would have commissioned art for one of these posts.

Comment via: facebook, mastodon, bluesky, substack

New Comment
46 comments, sorted by Click to highlight new comments since:

You don't know whether I can find photos of the people that wanted to remain anonymous given those pictures and the techniques available in a year.

It's a possibility, but this seems to remove a ton of information to me. The Ghibli faces all look quite similar to me. I'd be very surprised if they could be de-anonymized in cases like these (people who aren't famous) in the next 3 years, if ever.

If you're particularly paranoid, I presume we could have a system do a few passes. 

That would be ordinarily paranoid.

Somebody who doesn’t understand cryptography might devise twenty clever-seeming amateur codes and apply them all in sequence, thinking that, even if one of the codes turns out to be breakable, surely they won’t all be breakable. The NSA will assign that mighty edifice of amateur encryption to an intern, and the intern will crack it in an afternoon.

Even quantum cryptography couldn't restore cleartext that had half of it redacted and replaced with "----" or something.

You can use whichever information you got to update your priors about what message was sent.

Yeah, but then you really lose the capacity to deanonymize effectively. On priors, I can guess you’re likely to be American or Western European, probably like staying up late if you’re the former/live in Western timezones. I can read a lot more of your comments and probably deduce a lot, but just going off your two comments alone doesn’t make it any more likely to find where you live, for instance.

Identifying the locations of the pictures seems quite plausible to me, but the model has done such a "bad" job with the people that I really doubt the information is still there. Though you're right that I don't know for sure.

E.g. gender tends to make it into the picture, which is one bit. Are there 33 bits? We don't know the model's idiosyncrasies, but I wouldn't be surprised to learn of correlations like "scars on input faces translate into stoic expressions". Separately I can get a bunch of bits by assuming that the person has been on a photo before that includes one of the people in the picture or that was taken in a nearby location.

I'm not sure gender made it through for background people in the puddle image. In the other direction, though, are lots of confounders. Some of the "people" are actually multiple people merged together. Race is are partially randomized. Some faces are fully invented, since they're not visible in the original pictures.

(To make sure we're on the same page, I'm not claiming the party image anonymized me.)

why not generate them with a more generic art style?

Any art style becomes generic if repeated enough. Having now seen a bunch of Ghiblified images, I find them as emptily cloying as a Thomas Kinkade. When you can just push the button to get another, and another, like Kinkade did without a computer, it renders the thing meaningless. What are such images for, when all the specificity of the scene they are supposed to be illustrating has been scrubbed off?

Have some Bouguereau! Long dead and safely out of copyright.

"A traditional American barn dance, in the style of William-Adolphe Bouguereau."

I actually feel like this is a particularly bad use of the tool, because it is random enough in the number and scope of its errors that I can't be confident in my mental picture of this person at all. And on top of that, this specific one renders people in pretty predictable fashion styles, so I don't really know what she looks like in any sense.

I can't correct for the errors in the way I could for human-created art. If this were drawn by an actual Ghibli artist, I'd feel pretty confident that it was broadly like her, and then I might be able to extrapolate her facial features by comparing this character to actual Ghibli characters. AI isn't doing the same function of transforming real-person-features into Ghibli-character-features than a human artist would perform, so I can't expect that it would map features to drawn features in the same way. It might just pick a "cute" face that looks nothing like her.

many of my friends are strongly opposed to AI-generated art, ... I think this sort of usage isn't much of a grey area: I would previously have just left off the image.

 

I agree it's not much grey area: why wouldn't you just leave out the image? I don't see a case made for what value they bring to posts (but I also don't know what kind of posts you're referring to).

If I were one of your friends, I would (edit: be one of those who) ask you not to post a picture of me online. One of my major motivations would be to avoid it becoming training data. If you instead fed a picture of me directly to an LLM explicitly designed to offensively (to him) imitate one of the great artists of our time, I would probably reconsider our friendship altogether.

EDIT: This is clearly an unwelcome comment but nobody is saying why, so I'm asking. I think the tone is more hostile than necessary. I think the substance is pretty light. But to me, both are also clearly in the ballpark of the original post. I can restate it? But again, it feels to me like the original post is being shown undue reverence when I actually think it's not a very high quality post.

I read this post as, "Here is a neat toy for hackers that I found," which is fine, but not a discussion of rationality. It's not what I come here for but I'm the newcomer so I can be wrong on the appropriateness of this post. But also the toy is very controversial, even to the author's friends, and the author is pretty dismissive of that. And the toy is being used, seemingly without consent, on friends who have privacy concerns in a way that seemingly still violates their privacy (it is not said whether or not the subjects agreed to be ghiblified). So I felt these were worth criticizing - did I do that wrong or are these types of criticisms unwelcome here?

EDIT 2: "unwelcome" is certainly the wrong word as this has now been upvoted, it's merely a disagreeable comment. Would like to know the ways in which people disagree but oh well.

I also don't know what kind of posts you're referring to

The posts in question (linked from this post) are:

I see, I didn't notice those were links to the posts.

I don't think that changes anything though. I also think the style choice doesn't contribute to the vibe of the post but arguably that's a personal preference only

And the toy is being used, seemingly without consent, on friends who have privacy concerns in a way that seemingly still violates their privacy

In the case of the house party, I asked the party hosts if I could post a photograph, and one of them suggested this method of anonymization. In the puddle picture I didn't ask, but the kids are so generic that I really don't see how it would violate their privacy.

Alright - that is fine, but it also doesn't really reapond to the points I was making. I wasn't concerned whether there was consent from people in a particular instance, but about how consent wasn't described as part of your process.

Also, as I mentioned, for the kids: just uploading the photos to the ghibli LLM site seems like a likely privacy violation. 

Do you think it's wrong to take pictures in public places and put them online, even if there are people in the background who didn't consent? I think of this as a very normal thing to do, though it does make them available for LLM-training-scraping among other things.

My answer to that question would not quite be a categorical yes or no.  For example, there’s a difference between a manually taken selfie and a complete raw security camera feed.

But I do agree this is straying from the original topic a bit.  Since the top-post use case is explicitly one where you’ve already decided you’re not comfortable posting the original photo publicly, I feel like the general acceptability of posting photos is mostly irrelevant here?  I think a more on-point justification would be talking about why it’s more acceptable for an AI to see the original photo than for your general audience to see that same photo.

(To be clear I don’t personally have a major problem with this practice, at least as you’ve applied it so far, although I also don’t think it’s really added or subtracted much from my enjoyment or understanding of your posts so far.  Mostly I just don’t find this particular justification to be convincing.)

Personally, yeah I do think it's mildly wrong. Normal and ethical aren't always correlated.

Again though, isn't this getting a little off topic? It seems to be staying that way so it's fine with me if we just let this conversation die off. Almost none of my questions or points are really being addressed (by you or the many others who disagree with me), so continuing doesn't seem worthwhile. I've been hanging onto the thread hoping to get some answers, but from my perspective it's just continued misdirection. I don't think that's your intent, and I may have put you on the defensive, but overall this is a negative-value conversation for me as I'm leaving more confused.

I admit, I’ve been drive by downvoting posts with ghiblified illustrations, which isn’t how downvotes are supposed to be used. Something about them is so incredibly upsetting to me

I don't know why everyone is making this so complicated when there's a clear disqualifying factor for me: Miyazaki himself has said that they did not consent to be trained on, would not have consented to being trained on, and do not want anyone making Ghibli art, and all of this was known before Sam Altman started pushing Ghibliffication. There are other factors too, but this one by itself is already sufficient for me.

EDIT: I see a lot of upvotes and disagreement on this comment, which I think I agree with. I should have clarified, this is personally disqualifying to me, because I personally care a little about respecting Miyazaki's wishes, and even though he's a grumpy old man I disagree with on a lot of things, he's also someone I care about in a small way so I try to be respectful of what I understand he's tried to teach me about the world, if that makes sense? I was definitely not advocating for this to become government policy or something, though I do separately agree with that recent memo from the Copyright office.

One obvious reason to get upset is how low the standards of people posting them are. Let's take jefftk's post. It takes less than 5 seconds to spot how lazy, sloppy, and bad the hands and arms are, and how the picture is incoherent and uninformative. (Look at the fiddler's arms, or the woman going under 2 arms that make zero sense, or the weird doors, or the table which seems to be somehow floating, or the dubious overall composition - where are the yellow fairy and non-fairy going, exactly?, or the fact that the image is the stereotypical cat-urine yellow of all 4o images.) Why should you not feel disrespected and insulted that he was so careless and lazy to put in such a lousy, generic image?

I was in this case assuming it was a ghiblified version of a photo, illustrating the very core point of this post. Via this mechanism it communicated a lot! Like how many people were in the room, how old they were, a lot about their emotional affect, how big the room was, and lots of other small details.

First, I didn't say it wasn't communicating anything. But since you bring it up: it communicated exactly what jefftk said in the post already describing the scene. And what it did communicate that he didn't say cannot be trusted at all. As jefftk notes, 4o in doing style transfer makes many large, heavily biased, changes to the scene, going beyond even just mere artifacts like fingers. If you don't believe that people in that room had 3 arms or that the room looked totally different (I will safely assume that the room was not, in fact, lit up in tastefully cat-urine yellow in the 4o house style), why believe anything else it conveys? If it doesn't matter what those small details were, then why 'communicate' a fake version of them all? And if it does matter what those small details were, surely it's bad to communicate a fake, wrong version? (It is odd to take this blase attitude of 'it is important to communicate, and what is communicated is of no importance'.)

Second, this doesn't rebut my point at all. Whatever true or false things it does or does not communicate, the image is ugly and unaesthetic: the longer you look at it, the worse it gets, as the more bland, stereotypical, and strewn with errors and laziness you understand it to be. It is AI slop. (I would personally be ashamed to post an image even to IRC, never mind my posts, which embodies such low standards and disrespects my viewers that much, and says, "I value your time and attention so little that I will not lift a finger to do a decent job when I add a big attention-grabbing image that you will spend time looking at.") Even 5 seconds to try to inpaint the most blatant artifacts, or to tell ChatGPT, "please try again, but without the yellow palette that you overuse in every image"*, would have made it better.

* incidentally, I've been asking people here if they notice how every ChatGPT 4o-generated image is by default yellow. Invariably, they have not. One or two of them have contacted me later to express the sentiment that 'what has been seen cannot be unseen'. This is a major obstacle to image editing in 4o, because every time you inpaint, the image will mutate a decent bit, and will tend to turn a bit more yellow. (If you iterate to a fixed point, a 4o image turns into all yellow with sickly blobs, often faces, in the top left. It is certainly an odd generative model.)

Gwern, look, my drawing skills are pretty terrible. We've had sequences posts with literal pictures of napkins where Eliezer drew bad and ugly diagrams up here for years. Yes, not everything in the image can be trusted, but surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description (and at the very least it is much faster for me to parse than a literal description). 

I know the kinds of errors that image models make, and so I can adjust for them. They overall make many fewer errors than jefftk would make if he were to draw some stick figures himself, which would still be useful. 

The image is clearly working at achieving its intended effect, and I think the handwringing about it being unaesthetic is overblown compared to all realistic alternatives. Yes, it would be cool if jeff prompted more times, but why bother, it's getting the job done fine, and that's what the whole post is about.

surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description

 

But what are they? You've received some true information, but it's in a sealed box with a bunch of lies. And you know that, so it can't give you any useful information. You might arbitrarily decide to correct in one direction, but end up correcting in the exact opposite direction from reality.

For example: we know the AI tends to yellow images. Therefore, seeing a yellowed AI-generated image, that tells us that the color of the original image was either not yellow or... yellow. Because it doesn't de-yellow images that are already yellow. We have no idea what color it originally was.

If enough details are wrong, it might as well just be a picture of a different party, because you don't know which ones they are.

As for using a different image: drawing by hand and using AI aren't the only options. Besides AI,

  • there are actual free images you can use. As far as I know, this could be a literal photo of the party in question, and it's free: https://unsplash.com/photos/a-man-and-woman-dancing-in-a-room-with-tables-and-chairs-KpzGmDvzhS4
  • You could spend <1hr making an obviously shitty 'shop from free images with free image editing software. If you've ever shared a handmade crappy meme with friends, you know this can be a significantly entertaining and bonding act of creativity. The effort is roughly comparable to stick figures and the outcome looks better, or at least richer.

With all that said, and reiterating gwern's point above, I can't agree it achieved its intended effect. It is possible that jefftk put in a lot of effort to make sure the generated vibe is as accurate as could reasonably be, but the assumption is that someone generating an AI image isn't spending very much effort, because that's the point of using AI to generate images. There are better tools for someone making a craft of creating an image (regardless of their drawing skill). In order for that effort to be meaningful, (because unlike with artistic skill it doesn't translate to improved image quality,) he'd have to just tell us, "I spent a lot of time making sure the vibe was right, even though the image is still full of extra limbs." And this might actually be a different discussion, but I'd be immediately skeptical of that statement - am I really going to trust the artistic eye and the taste of someone who sat down for 2 hours to repeatedly generate a ghiblified AI image instead of using a tool that doesn't have a quality cap? So ultimately I find it more distracting, confusing, and disrespectful to read a post with an AI image, which, if carelessly used (which I have to assume it is), cannot possibly give me useful information. At least a bad stick figure drawing could give me a small amount of information.

It is possible that jefftk put in a lot of effort to make sure the generated vibe is as accurate as could reasonably be

I didn't. I picked out a photo that I was going to use to illustrate the piece, one host asked me not to use it because of privacy, another suggested Ghiblifying it and made one quickly on their phone. We looked at it and thought it gave the right impression despite the many errors.

As far as I know, this could be a literal photo of the party in question, and it's free: https://unsplash.com/photos/a-man-and-woman-dancing-in-a-room-with-tables-and-chairs-KpzGmDvzhS4

The vibe of the generated image is far closer to the real party than the image you linked.

[+]curvise*-7-8

Look at the fiddler's arms

The left arm is holding the fiddle and is not visible behind my body, while the right arm has the sleeve rolled up above the elbow and you can see a tiny piece of the back of my right hand poking out above my forearm. The angle of the bow is slightly wrong for the hand position, but only by a little since there is significant space between the back of the hand and the fingertips holding the bow.

(Of course, as I write in my post, it certainly gets a lot of other things wrong. Which is useful to me from a privacy perspective, though probably not the most efficient way to anonymize.)

I know that many of my friends are strongly opposed to AI-generated art, primarily for its effect on human artists.

Also, in general, I don't like the practice of using people's work without giving them any credit. Especially when used to make money. And even moreso when it makes the people who made the original work much less likely to be able to make money

I don't like the practice of using people's work without giving them any credit. Especially when used to make money. 

Do you dislike open source software? For most of them the credit is of the license or name. Quite similar to ghibli, where a person drops the name of the artstyle. 

 

And even moreso when it makes the people who made the original work much less likely to be able to make money. 

In open source stuff, backend libraries are less likely to get paid compared to frontend products, creating a product can make the situation worse for the OG person. It can be seen predatory, but that's the intent of open source collaboration fwiw.

Do you dislike open source software? For most of them the credit is of the license or name. Quite similar to ghibli, where a person drops the name of the artstyle. 


If the artist says they're ok with a model being trained on their work, then it's relatively fine with me. Most artists explicitly are not and were never asked - in fact, most licensed their work in a way that they should be paid for it's use. 

 

In open source stuff, backend libraries are less likely to get paid compared to frontend products, creating a product can make the situation worse for the OG person. It can be seen predatory, but that's the intent of open source collaboration fwiw.

In art, the art is usually the product itself and if it's used for something, it's usually agreed upon with the artist and user, unless the artist has explicitly said they're ok with it being used - e.g. some youtubers have said it's ok to use their music in any videos (although this isn't the same as it being used for training a model)

The main point here being respecting the work and consent of the creator.

There is an image diffusion model named 'Mitsua' (easy to load up in Stable Diffusion) which is trained only on public domain and donated training data, which I use for a similar purpose at work.

I appreciate the ability to create quick "vibe sketches' of ideas I want to express in a post, in cases where I don't want a more precise method like a Mermaid chart or Table or a true drawing/diagram.

I'm on the lookout for more models like this, because I don't like supporting any company that has historically been coy about its training data, which includes OpenAI and Anthropic.

Adobe seems to have gone through a little effort to do ethical sourcing of training data, so it's better than OpenAI and Anthropic in that regard even when it also isn't perfect.

For sure! Much like the AI safety scorecard, no one is out of the red, but it seems like some of the older publishing house type companies are trying to respect existing content licensing institutions. However, I've seen many creators and artists complain that it doesn't matter; it's already too overshadowed by the actions of OpenAI et al.

I'm on the lookout for more models like this

Here's a recent one where the quality is pretty good: f-lite. They say, "The models were trained on Freepik's internal dataset comprising approximately 80 million copyright-safe images."

More from jefftk
Curated and popular this week