The Gradient @thegradient

As many suspected:

“Midjourney Founder Admits to Using a ‘Hundred Million’ Images Without Consent”

https://petapixel.com/2022/12/21/midjourny-founder-admits-to-using-a-hundred-million-images-without-consent/

PetaPixel · Dec 21, 2022Midjourney Founder Admits to Using a 'Hundred Million' Images Without ConsentIt has outraged artists and photographers.

Dec 22, 2022, 02:29 AM·

87boosts·79favorites

**Ryan Moulton** @moultano · Dec 22, 2022

Dec 22, 2022

Ryan Moulton @moultano

@Riedl CLIP did too though.

**Mark Riedl** @Riedl · Dec 22, 2022

Dec 22, 2022

Mark Riedl @Riedl

@moultano they all did as far as I know, but some are more secretive than others so confirmation is good to report.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@Riedl @moultano It is funny to see OpenAI just not mention it and reap the benefits. I've had people sincerely praise OpenAI for how it treats artists relative to Stability, which is imo just a hilarious PR gap.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@moultano I'm getting probably close to using 1 million images with my brain, personally. Is that better or worse?

**Ryan Moulton** @moultano · Dec 22, 2022

Dec 22, 2022

Ryan Moulton @moultano

@Adverb I don't know. I'm mostly staying out of the ethics of all this because I don't want to be subpoenaed in some future litigation against my employer, and because I find it all genuinely confusing.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@moultano This is wise!

**Ryan Moulton** @moultano · Dec 22, 2022

Dec 22, 2022

Ryan Moulton @moultano

@Adverb Every analogy people make requires AI art to be "like" something else, and we're just arguing about which thing it's "like." But I don't think it's like anything else.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@moultano it's fair to say that analogy is not exact here (Dryhurst talks about this too), though I think the principle is the same on many fronts.

The scale could never be ofc.

**Ryan Moulton** @moultano · Dec 22, 2022

Dec 22, 2022

Ryan Moulton @moultano

@Adverb I would be happier if the models trained with differential privacy. I think that would be closer to the norms of inspiration that artists expect from each other.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@moultano
To the end of preventing memorization?

I feel like it's already past human artists in some ways by having a similarity-queryable dataset if people are worried.

And that post by OpenAI on deduplication and (ironically) that one on Stable Diffusion's regurgitation that notes the imagenet LDE sees no significant memorization make me very not-worried about memorization.

**Ted Underwood** @TedUnderwood · Dec 22, 2022

Dec 22, 2022

Ted Underwood @TedUnderwood

@Adverb @moultano I should already know those, but if you wanted to toss in a couple of links I would bookmark them.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@TedUnderwood @moultano definitely: https://openai.com/blog/dall-e-2-pre-training-mitigations/
This one is a big deal!!!

https://arxiv.org/abs/2212.03860
And this one is frustrating but ironically reassuring given that they see dataset seems to be the primary problem/mitigator.

OpenAIDALL·E 2 Pre-Training MitigationsIn order to share the magic of DALL·E 2 with a broad audience, we needed to reduce the risks associated with powerful image generation models. To this end, we put various guardrails in place to prevent generated images from violating our content policy. This post focuses on pre-training mitigations,

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@TedUnderwood @moultano They LITERALLY CANNOT DETECT REPLICATION WITH IMAGENET!!! (Pardon my screaming.)

But nobody bothers reading the paper

**Daniel Lowd** @lowd · Dec 22, 2022

Dec 22, 2022

Daniel Lowd @lowd

@Adverb @moultano @TedUnderwood Yes, the framing in the abstract and intro doesn’t really match the results.

**Advadnoun** @Adverb · Dec 22, 2022

Dec 22, 2022

Advadnoun @Adverb

@lowd @moultano @TedUnderwood Yeah! Not to mention the model card for stable diffusion explicitly states this issue!

But due to this paper even some many many people think this was some hidden secret. Was having to argue over this with an ML industry person just yesterday.

**Ted Underwood** @TedUnderwood · Dec 22, 2022

Dec 22, 2022

Ted Underwood @TedUnderwood

@Adverb @moultano Yes, actually I do remember reading that paper and thinking "hmm — if I'm reading this rightly it's only a problem for small models." But no one talked about it that way so I thought I was crazy.