The AI Art Revolution and its impact on photography

Joules

doom
CR Pro
Jul 16, 2017
1,797
2,235
Hamburg, Germany
I have been super interested in absorbing content around the latest advances in AI Art generators (DALL-E 2, Midjourney, Imagen, Stable Diffusion, ...) for the last few weeks and had great fun playing around with it myself. I find the subject to be fascinating on a number of levels and both the results and discussion around them very engaging. Thoughts and feelings around this technology vary widely from what I have seen while also a lot of people seemingly have not actually taken note of what is happening at all or do not think that it is a big deal. Which I find surprising, as to me it is totally mind blowing how good and accessible this technology has already become and I feel that it has huge implications for the creative industries in the short term, and perhaps even the camera market to a degree.

For those who may be unaware what I am talking about: Research into the creation of images through artificial neural network approaches has been going on for a while and produced interesting results in the past for certain niche applications, like creating pictures of realistic human faces (This site will generate a unique human face each time you reload it), or animals, or applying a reference style to a given image. But those appear to mostly exist in the realm of research. But more recently, newer approaches have emerged that are able to generate images of arbitrary subject matter and are easy enough to use to incorporate them into actual workflows. In the simplest form, these tools take only a small text describing the image as input and return a generated image that tries to adhere to the input prompt as closely as possible. Furthermore, these tools allow creating new images from existing one in a variety of interesting ways, like completing a rough sketch into a detailed piece, creating small variations of the input image, or replacing a section in it with something newly generated, like a more advanced version of content aware fill.

So although the most talked about and easily most accessible use for these tools at the moment is the one where only text is put in and an image comes out, the technology actually provides much finer control over composition and details to guide the AI to a desirable result. And although the default behavior of these tools currently provides images that are usually below one megapixel, there are also approaches for intelligent upscaling and stitching of multiple generated images to drastically improve resolution.

As someone who does photography purely as a hobby because I enjoy both the process of creating an image and looking at the result and its esthetic properties, the fun in using these tools comes from exploring the sometimes truly stunning beauty of the results and the possibility to experiment with types of imagery that I could/would not be involved in usually, like portraits of human models, luxury cars, exotic architecture, fantastical landscapes and so on. To include some results that I have got using a local installation of 'Stable Diffusion' with a slightly tweaked version of the relevant python script:

'Beautiful rose princess, standing in a vivid landscape. Photorealism, art by artgerm, detailed, RTX, 35mm 1.4'
Rose_Princess_LMSDiscreteScheduler_Steps 50_Guidance 9_Seed 164444876.png Rose_Princess_LMSDiscreteScheduler_Steps 50_Guidance 10_Seed 875489223.png Rose_Princess_PNDMScheduler_Steps 50_Guidance 9_Seed 199188871.png Rose_Princess_PNDMScheduler_Steps 50_Guidance 9_Seed 222841616.png

This particular AI model is trained on freely available images found across the internet. In particular it is trained on the alternative description texts that are associated with web images to be displayed if the image can not be loaded, or read out for people with visual issues. So adding a variety of words commonly found on the internet to the end of a natural language prompt can help guide the look of the image. A side effect of this training input is that the model has learned that people often put little water marks on images and tries to imitate this behavior in its own creations, as you can see in some of these examples.

I would never go through the process necessary to create such images with a camera. That's just not the kind of photography I enjoy and associated with too much work and expense for a hobby anyway. But each of these images takes just under 20 seconds to be created, and it is quite addictive to get something this visually pleasing for such little effort. It almost becomes like a game, trying to steer the computer into the direction that best suites the idea in ones mind, or just exploring what it comes up with when given room to experiment on its own. Here are some more that I liked quite a lot, some with prompts that I created myself and some just slightly tweaked from the vast gallery of generations found on Lexica:

Sports_Car_LMSDiscreteScheduler_Steps 80_Guidance 10_Seed 469744603.png Car_PNDMScheduler_Steps 65_Guidance 9_Seed 33197464.png Concept_Car_LMSDiscreteScheduler_Steps 80_Guidance 10_Seed 996896159.png

Catt_Seed 675829486_PNDMScheduler_Steps 50_Guidance 6.png Epic_Mountains_LMSDiscreteScheduler_Steps 50_Guidance 7_Seed 95893030.png Science_Base_Seed 211380639_PNDMScheduler_Steps 65_Guidance 6.png

As I said, I find it mind blowing that technology has reached a point where I can write a little sentence and a program that is completely free and running on my mid range consumer graphics card, is able to translate that text into an image that often accurately captures what was described in the sentence.

On the one hand I am very exited for this technology to become even more accessible by being integrated into photoshop, for example, so that one can actually tell content aware fill what exactly to put into the filled area. Or selecting a section of an image and asking for variations of that image. Or variations of a face, for example fixing a group shot where one person managed to blink or talk at the wrong time.

On the other hand I am curious where the discussion around this will go. Text and speech recognition has gone a long way but somehow it seems to me like this form of human-machine interaction is the first one that makes it tangible just how well state of the art neural networks can 'understand' natural language.

There are however controversies around degrading the value of real digital artists who produce these kind of results through years of experience and practice. And also around the fact that one can imitate famous styles by simple typing the name of a respective artist,

I am also interested in seeing how this will impact the camera market. Who in the mainstream population even needs a low end camera kit anymore if you can do all the documentational photography with your phone and create the instagram style show off pieces with a computer? Will this level of esthetic quality drown out true photography from mainstream social media platforms where even photography is heavily amplified through filters already anyway?

And the implications for misinformation are of course also huge. This model has no restrictions. You can type in any sufficiently famous politician and get a photorealistic image of them doing what ever to wrote in the prompt. But using the medium of photography as evidence has been getting ever more shaky anyway, so I think this just puts the final nail into an already well sealed coffin.

If this is an old hat for everybody reading, so be it. But I just felt like sharing some of this and perhaps find out about some other interesting applications and use cases if somebody here has used these tools and is willing to give some insight into their usage.
 

Joules

doom
CR Pro
Jul 16, 2017
1,797
2,235
Hamburg, Germany
On a another note: I have seen people describing the way this technology works as looking through some form of database and just combining elements of different picutres together. That is not at all accurate. These models actually create new imagery, generalising from the few images they were fed during training to a much greater space of possible pixel arrangements. Interestingly enough, the way these images come into existance actually starts with pure noise and from there, the model essentially executes an elaborate step by step denoising algorithm that is guided by the text prompt. So in a way, these models are the worlds best denoising solutions. I tweaked the script I use to capture the in-between images and turned them into a GIF for illustration purposes:

prog_comp.gif

If you are wondering why the noise looks kinda weird, the model actually operates on an extremely tine resultion (just 64 X 64 pixels I believe) internally, but incorporates an upscaling step into the creation of the image. All the stuff I generated above is 512 X 512 because that is the resolution the model was trained on and where it currently does the best - also because my GPU does not have enough memory for larger resolutions, so I would have to use the much slower CPU for that, bumping the time to generate an image from seconds to minutes.

Also worth sharing since I mentioned it: Here are some examples of the feature where I painted a crude sketch with my finger on my phone and let the AI fill in the details:

A bunny sitting in a grassy field.png A carribean island with a palm tree.png A crystal tower with intricate details and glowing crystals.png A magical forrest with purple and blue trees and glowing mushrooms.png A majestic phoenix with feathers made of flames.png

These are just made using the online demo that the creators are hosting here:

It is a simplified version of the actual program, laking the settings that give you more control over the generations. Nonetheless, quite cool.

For those interested in giving the full model a try, as Stable Diffusion is open source and can be integrated into various other projects, there is currently a number of options to chose from. As I said, I run it locally with a python script but that is fairly involved setup process. There are online hosted versions like these, that also work well and are either free or offer a decent free trial period:
https://beta.dreamstudio.ai/dream (Official Online interface of the company behing the model)
 

unfocused

EOS-1D X Mark III
Jul 20, 2010
6,902
4,953
68
Springfield, IL
www.mgordoncommunications.com
Fascinating. And, hard to wrap my head around.

My initial reaction is that I don't see this changing my own photography.

For me, photography falls into the broad categories of personal expression and attempting to capture my own perspective of what I find in life and in nature. In both cases, the goal is to use my own skills to come as close as possible to getting what I would consider a "perfect" picture. The fun is in striving and coming close. Using AI to augment my own skills is uninteresting to me personally.

Now, when I did work for pay, I could see AI as a way to help get the client what they want and I suspect that in the near future, AI will have a profound and not necessarily positive impact on commercial photography and photographers. On the other hand, things seem to go in cycles and there may reach a point when things circle back to "natural" photos.
 

Joules

doom
CR Pro
Jul 16, 2017
1,797
2,235
Hamburg, Germany
My initial reaction is that I don't see this changing my own photography.
I agree. I sometimes got asked whether I use my phone for taking pictures as well as my DSLR and my answer is usually that I do not enjoy using a smartphone as it takes away the high degree of control that I have over image thanks to the interchangeable lens, buttons and dials.

In that respect, this approach to creating images should be even less enjoyable. And to a degree that us true. I think the core difference here is the form of interaction. Using words and sketches to convey an idea is very human and natural to us I think. And this is the first occasion where it actually works well enough most of the time to feel rewarding when ones input yields a noce output.

Which, mind you, is not always the case on the first try. This is still very much new and evolving technology and a lot of the generated images come out looking wonky. So my showcases above are certainly cherry picked, but due to how quickly one can iterate on results one has a large basket to pick from.

For me, photography falls into the broad categories of personal expression and attempting to capture my own perspective of what I find in life and in nature.
I also enjoy photography for its ability to change how I experience a situation, motivation me to pay attention to the visual aspects of the world around me. That is not something that can be improved through any form of technology, and it doesn't need to be.

I also don't consider my images to be 'art' or express anything that the viewer should interpret.

That's where the discussion around this seems the least productive. People going back and forth about whether it can be considered art when the human input is so minimal or making statements along the line of computers not being able to have creativity don't seem to matter much if someone who doesn't know how an image was made can recognize artistic intent and creativity in it.

When computing devices where able to do what a traditional computer (A person doing computations) would be required for in the past, those use cases that build on the numbers would usually not care how the results come to be. If the output is sufficiently indistinguishable from a human traditional one, does it really matter if some people don't acknowledge it?
Now, when I did work for pay, I could see AI as a way to help get the client what they want and I suspect that in the near future, AI will have a profound and not necessarily positive impact on commercial photography and photographers.
The most interesting applications I have seen so far come from that kind of work. Using the ability to create tons if detailed variations in short time to better discus ideas with clients seems to be something that photographers, painters, 3D artists and the like all have a use for.

I've also seen photographers creating complicated scenes with the AI and photoshopping their models into them, or using the AI for props or head pieces and combining those with pictures of clients. Giving creatives the ability to at least come close to offering a result that would otherwise require skill or access in unrelated fields like costume making and set building should have at least some sort of effect on commercial photography.

Before I looked into all of this, I was unaware that Adobe releases photoshop beta versions that give one acces to experimental features. Among those features in the current beta is one that 'fixes' faces, seemingly at least partially intended to address the small mistakes that the AI models often make when creating faces. At the end of the day, these tools are only relevant if they are convinniend to use and luckily it seems there is a lot to look forward to in this fast moving field of technology.