Is MidJourney just copying other artists' work?

This is the big question, right? Is Midjourney stealing other people's art and claiming it as its own?

Well let's take a look at how these systems work. But before I do, while MidJourney DOES prohibit certain words, they could just prohibit the names of artists, if they wanted to prevent their product from building works like existing artists. This is clearly not a technological hurdle. So it's odd that this feature is in there, considering the potential harm it could cause artists...

Okay onward then.

We don't actually know how Midjourney works internally, as it's a closed source product at the moment. But we can assume it is similar to other machine learning systems. And that means it probably uses a Generative Adversarial Network (or GAN) to do its magic. GANs work by basically pitting two AI concepts against each other. 

There's a ton of info on GANs today, so I'll skip that and just say that at its simplest, you can think of it like this: one part of the system draws a picture, and the other part tells it if it drew a picture that matches what it's supposed to be. We'll call them Artist and Critic. 

The Artist looks at a text prompt, in this case, "an old cottage on the bank of a stream in the sacred forest" and draws some shapes. 

The Critic, meanwhile, is out taking notes in the real world. Among its travels it's found some images labelled 'cottage' and it jots down what it thinks a cottage is. This process is "unsupervised", which means the Critic has criteria we humans don't understand. Maybe a cottage has vertical sides. Maybe they slope inward. Maybe they're straight edges. Over time, and millions of images, it comes up with its own criteria for pretty much everything. 

So now the Critic looks at the Artist's shapes and says, yeah, that's a cottage, but that's not a stream, do it over. And the artist complies. And after millions and millions of shapes, the artist is getting pretty good at drawing what the Critic thinks it should look like. 

This is how it's done. There is no copying and pasting. 

Then, the Critic also looks at different artists' styles. This guy uses lines that are close to each other. This other guy uses brush strokes that are wide at the middle and taper off. Again, we don't know what kind of criteria the Critic uses, which is part of the beauty of the system. We're letting the machine learn on its own, and letting it define the rules it uses to learn.

This results in a system that can do some amazing things. Here is "an old cottage on the bank of a stream in the sacred forest"



And here is "a sketch of an old cottage on the bank of a stream in the sacred forest drawn in ballpoint "

If you'll notice, one of the drawings appears to have some text on it. The Critic thinks that's probably part of any good ballpoint pen drawing, but doesn't know what it is. Some squiggly stuff, definitely. 

Midjourney kinda knows what ballpoint pen sketches look like, but it's not positive. You can see some straight lines, but a lot of blurry spots. Maybe the artists whose work it looked at smudged their hand across their drawings?


Okay so let's see it try that with... macaroni art from a child. Midjourney doesn't quite know what macaroni is, but in the top right image you can see it kinda figured out what macaroni shapes look like.

"an old cottage on the bank of a stream in the sacred forest drawn by a child"


Okay, let's try something different. Let's try H. R. Giger style. 

"a painting of an old cottage on the bank of a stream in the sacred forest by H.R. Giger --ar 16:9"


Let's go the other way and try Kinkade. Again note the squiggly bits in the lower right. It thinks his signature is part of his style of painting, but it's not aware of what the signature is, just some bright strokes on darker surfaces.

"
"a painting of an old cottage on the bank of a stream in the sacred forest by thomas kinkade --ar 16:9"

And lastly let's try the same thing but combining multiple styles...

"an old cottage on the bank of a stream in the sacred foreset Peter Mohrbacher, Marc Simonetti, Mike Mignola --ar 16:9"


It's not copying anything out of the original artists' work other than some elaborate internal definition of what their "style" is. The rest is generated the same, regardless of the artist in question. 

Personally, I don't see this as copying, at least not in the traditional sense. What do you think? 


And if you're wondering what this has to do with mocap, well, quite a bit actually. But that will be explained in a future post :)


Comments

Popular posts from this blog

Glycon3D Motion Capture - History of the VR Based motion capture system

Station 46b - Meet Larry