The AI Art Revolution and its impact on photography

Joules

doom
CR Pro
Jul 16, 2017
1,801
2,247
Hamburg, Germany
I have been super interested in absorbing content around the latest advances in AI Art generators (DALL-E 2, Midjourney, Imagen, Stable Diffusion, ...) for the last few weeks and had great fun playing around with it myself. I find the subject to be fascinating on a number of levels and both the results and discussion around them very engaging. Thoughts and feelings around this technology vary widely from what I have seen while also a lot of people seemingly have not actually taken note of what is happening at all or do not think that it is a big deal. Which I find surprising, as to me it is totally mind blowing how good and accessible this technology has already become and I feel that it has huge implications for the creative industries in the short term, and perhaps even the camera market to a degree.

For those who may be unaware what I am talking about: Research into the creation of images through artificial neural network approaches has been going on for a while and produced interesting results in the past for certain niche applications, like creating pictures of realistic human faces (This site will generate a unique human face each time you reload it), or animals, or applying a reference style to a given image. But those appear to mostly exist in the realm of research. But more recently, newer approaches have emerged that are able to generate images of arbitrary subject matter and are easy enough to use to incorporate them into actual workflows. In the simplest form, these tools take only a small text describing the image as input and return a generated image that tries to adhere to the input prompt as closely as possible. Furthermore, these tools allow creating new images from existing one in a variety of interesting ways, like completing a rough sketch into a detailed piece, creating small variations of the input image, or replacing a section in it with something newly generated, like a more advanced version of content aware fill.

So although the most talked about and easily most accessible use for these tools at the moment is the one where only text is put in and an image comes out, the technology actually provides much finer control over composition and details to guide the AI to a desirable result. And although the default behavior of these tools currently provides images that are usually below one megapixel, there are also approaches for intelligent upscaling and stitching of multiple generated images to drastically improve resolution.

As someone who does photography purely as a hobby because I enjoy both the process of creating an image and looking at the result and its esthetic properties, the fun in using these tools comes from exploring the sometimes truly stunning beauty of the results and the possibility to experiment with types of imagery that I could/would not be involved in usually, like portraits of human models, luxury cars, exotic architecture, fantastical landscapes and so on. To include some results that I have got using a local installation of 'Stable Diffusion' with a slightly tweaked version of the relevant python script:

'Beautiful rose princess, standing in a vivid landscape. Photorealism, art by artgerm, detailed, RTX, 35mm 1.4'
Rose_Princess_LMSDiscreteScheduler_Steps 50_Guidance 9_Seed 164444876.pngRose_Princess_LMSDiscreteScheduler_Steps 50_Guidance 10_Seed 875489223.pngRose_Princess_PNDMScheduler_Steps 50_Guidance 9_Seed 199188871.pngRose_Princess_PNDMScheduler_Steps 50_Guidance 9_Seed 222841616.png

This particular AI model is trained on freely available images found across the internet. In particular it is trained on the alternative description texts that are associated with web images to be displayed if the image can not be loaded, or read out for people with visual issues. So adding a variety of words commonly found on the internet to the end of a natural language prompt can help guide the look of the image. A side effect of this training input is that the model has learned that people often put little water marks on images and tries to imitate this behavior in its own creations, as you can see in some of these examples.

I would never go through the process necessary to create such images with a camera. That's just not the kind of photography I enjoy and associated with too much work and expense for a hobby anyway. But each of these images takes just under 20 seconds to be created, and it is quite addictive to get something this visually pleasing for such little effort. It almost becomes like a game, trying to steer the computer into the direction that best suites the idea in ones mind, or just exploring what it comes up with when given room to experiment on its own. Here are some more that I liked quite a lot, some with prompts that I created myself and some just slightly tweaked from the vast gallery of generations found on Lexica:

Sports_Car_LMSDiscreteScheduler_Steps 80_Guidance 10_Seed 469744603.pngCar_PNDMScheduler_Steps 65_Guidance 9_Seed 33197464.pngConcept_Car_LMSDiscreteScheduler_Steps 80_Guidance 10_Seed 996896159.png

Catt_Seed 675829486_PNDMScheduler_Steps 50_Guidance 6.pngEpic_Mountains_LMSDiscreteScheduler_Steps 50_Guidance 7_Seed 95893030.pngScience_Base_Seed 211380639_PNDMScheduler_Steps 65_Guidance 6.png

As I said, I find it mind blowing that technology has reached a point where I can write a little sentence and a program that is completely free and running on my mid range consumer graphics card, is able to translate that text into an image that often accurately captures what was described in the sentence.

On the one hand I am very exited for this technology to become even more accessible by being integrated into photoshop, for example, so that one can actually tell content aware fill what exactly to put into the filled area. Or selecting a section of an image and asking for variations of that image. Or variations of a face, for example fixing a group shot where one person managed to blink or talk at the wrong time.

On the other hand I am curious where the discussion around this will go. Text and speech recognition has gone a long way but somehow it seems to me like this form of human-machine interaction is the first one that makes it tangible just how well state of the art neural networks can 'understand' natural language.

There are however controversies around degrading the value of real digital artists who produce these kind of results through years of experience and practice. And also around the fact that one can imitate famous styles by simple typing the name of a respective artist,

I am also interested in seeing how this will impact the camera market. Who in the mainstream population even needs a low end camera kit anymore if you can do all the documentational photography with your phone and create the instagram style show off pieces with a computer? Will this level of esthetic quality drown out true photography from mainstream social media platforms where even photography is heavily amplified through filters already anyway?

And the implications for misinformation are of course also huge. This model has no restrictions. You can type in any sufficiently famous politician and get a photorealistic image of them doing what ever to wrote in the prompt. But using the medium of photography as evidence has been getting ever more shaky anyway, so I think this just puts the final nail into an already well sealed coffin.

If this is an old hat for everybody reading, so be it. But I just felt like sharing some of this and perhaps find out about some other interesting applications and use cases if somebody here has used these tools and is willing to give some insight into their usage.
 
  • Like
Reactions: 1 users

Joules

doom
CR Pro
Jul 16, 2017
1,801
2,247
Hamburg, Germany
On a another note: I have seen people describing the way this technology works as looking through some form of database and just combining elements of different picutres together. That is not at all accurate. These models actually create new imagery, generalising from the few images they were fed during training to a much greater space of possible pixel arrangements. Interestingly enough, the way these images come into existance actually starts with pure noise and from there, the model essentially executes an elaborate step by step denoising algorithm that is guided by the text prompt. So in a way, these models are the worlds best denoising solutions. I tweaked the script I use to capture the in-between images and turned them into a GIF for illustration purposes:

prog_comp.gif

If you are wondering why the noise looks kinda weird, the model actually operates on an extremely tine resultion (just 64 X 64 pixels I believe) internally, but incorporates an upscaling step into the creation of the image. All the stuff I generated above is 512 X 512 because that is the resolution the model was trained on and where it currently does the best - also because my GPU does not have enough memory for larger resolutions, so I would have to use the much slower CPU for that, bumping the time to generate an image from seconds to minutes.

Also worth sharing since I mentioned it: Here are some examples of the feature where I painted a crude sketch with my finger on my phone and let the AI fill in the details:

A bunny sitting in a grassy field.pngA carribean island with a palm tree.pngA crystal tower with intricate details and glowing crystals.pngA magical forrest with purple and blue trees and glowing mushrooms.pngA majestic phoenix with feathers made of flames.png

These are just made using the online demo that the creators are hosting here:

It is a simplified version of the actual program, laking the settings that give you more control over the generations. Nonetheless, quite cool.

For those interested in giving the full model a try, as Stable Diffusion is open source and can be integrated into various other projects, there is currently a number of options to chose from. As I said, I run it locally with a python script but that is fairly involved setup process. There are online hosted versions like these, that also work well and are either free or offer a decent free trial period:
https://beta.dreamstudio.ai/dream (Official Online interface of the company behing the model)
 
  • Like
Reactions: 1 user
Upvote 0

unfocused

Photos/Photo Book Reviews: www.thecuriouseye.com
Jul 20, 2010
7,184
5,483
70
Springfield, IL
www.thecuriouseye.com
Fascinating. And, hard to wrap my head around.

My initial reaction is that I don't see this changing my own photography.

For me, photography falls into the broad categories of personal expression and attempting to capture my own perspective of what I find in life and in nature. In both cases, the goal is to use my own skills to come as close as possible to getting what I would consider a "perfect" picture. The fun is in striving and coming close. Using AI to augment my own skills is uninteresting to me personally.

Now, when I did work for pay, I could see AI as a way to help get the client what they want and I suspect that in the near future, AI will have a profound and not necessarily positive impact on commercial photography and photographers. On the other hand, things seem to go in cycles and there may reach a point when things circle back to "natural" photos.
 
  • Like
Reactions: 3 users
Upvote 0

Joules

doom
CR Pro
Jul 16, 2017
1,801
2,247
Hamburg, Germany
My initial reaction is that I don't see this changing my own photography.
I agree. I sometimes got asked whether I use my phone for taking pictures as well as my DSLR and my answer is usually that I do not enjoy using a smartphone as it takes away the high degree of control that I have over image thanks to the interchangeable lens, buttons and dials.

In that respect, this approach to creating images should be even less enjoyable. And to a degree that us true. I think the core difference here is the form of interaction. Using words and sketches to convey an idea is very human and natural to us I think. And this is the first occasion where it actually works well enough most of the time to feel rewarding when ones input yields a noce output.

Which, mind you, is not always the case on the first try. This is still very much new and evolving technology and a lot of the generated images come out looking wonky. So my showcases above are certainly cherry picked, but due to how quickly one can iterate on results one has a large basket to pick from.

For me, photography falls into the broad categories of personal expression and attempting to capture my own perspective of what I find in life and in nature.
I also enjoy photography for its ability to change how I experience a situation, motivation me to pay attention to the visual aspects of the world around me. That is not something that can be improved through any form of technology, and it doesn't need to be.

I also don't consider my images to be 'art' or express anything that the viewer should interpret.

That's where the discussion around this seems the least productive. People going back and forth about whether it can be considered art when the human input is so minimal or making statements along the line of computers not being able to have creativity don't seem to matter much if someone who doesn't know how an image was made can recognize artistic intent and creativity in it.

When computing devices where able to do what a traditional computer (A person doing computations) would be required for in the past, those use cases that build on the numbers would usually not care how the results come to be. If the output is sufficiently indistinguishable from a human traditional one, does it really matter if some people don't acknowledge it?
Now, when I did work for pay, I could see AI as a way to help get the client what they want and I suspect that in the near future, AI will have a profound and not necessarily positive impact on commercial photography and photographers.
The most interesting applications I have seen so far come from that kind of work. Using the ability to create tons if detailed variations in short time to better discus ideas with clients seems to be something that photographers, painters, 3D artists and the like all have a use for.

I've also seen photographers creating complicated scenes with the AI and photoshopping their models into them, or using the AI for props or head pieces and combining those with pictures of clients. Giving creatives the ability to at least come close to offering a result that would otherwise require skill or access in unrelated fields like costume making and set building should have at least some sort of effect on commercial photography.

Before I looked into all of this, I was unaware that Adobe releases photoshop beta versions that give one acces to experimental features. Among those features in the current beta is one that 'fixes' faces, seemingly at least partially intended to address the small mistakes that the AI models often make when creating faces. At the end of the day, these tools are only relevant if they are convinniend to use and luckily it seems there is a lot to look forward to in this fast moving field of technology.
 
Upvote 0

Joules

doom
CR Pro
Jul 16, 2017
1,801
2,247
Hamburg, Germany
Interesting that this gathered so little attention here.

On digital art platforms, the discussion around this has been so heated and seems to have concluded with the majority of sites (including stock photo sites) having decided that no AI generated content is allowed on their platform.

Is everybody here actually so caught up with the tech that it does not impress anymore? I am continously blown away by it. By now, for example, it is possible to add arbitrary people and visuals into Stable Diffusion's vocabulary by presenting it with a couple of images. In other words, people can put themselves, their pets or anybody they want into their text prompts and actually produce images that feature their likeness.

And being open source for a change this is not just some kind of cool tech demo, but something that will become more and more accessible as people tweak the tools for integration into their work flows. The first photoshop plugins are already out there.

Crazy times. To me, at least.
 
Upvote 0

AlanF

Desperately seeking birds
CR Pro
Aug 16, 2012
12,355
22,534
I can save huge amounts of money for Nature photography, discarding my telephoto lenses and associated gear as well as carrying them and travelling just by conjuring up the scenes I want. Or, just watch David Attenborough as already available. You can take that straight as it is or as an allegory.
 
  • Like
Reactions: 1 user
Upvote 0

Joules

doom
CR Pro
Jul 16, 2017
1,801
2,247
Hamburg, Germany
I can save huge amounts of money for Nature photography, discarding my telephoto lenses and associated gear as well as carrying them and travelling just by conjuring up the scenes I want. Or, just watch David Attenborough as already available. You can take that straight as it is or as an allegory.
Fair enough. As I mentioned, most of the joy I get from my photography comes from the process behind it as well.

That probably applies to the majority of people who do this as a hobby. And many on this forum in particular.

But I wonder how much of photography in total is driven by this as opposed to the need for images to be used in some context.

All scenarios were images and those who produce them compete in some fashion, be that for space on a web page, viewer attention or money, appear likely to be impacted by these developments to me.

Accessibility of photography as a tool through smartphones has already shaped so much modern culture. I feel like it is important for people to be aware that the barrier to entry for a lot of different spaces has been altered.

An image says more than a thousand words. When everyone can create images, does that just enable more expression? Or will people become unable to discern the meaning and what's meaningful in the context of reality?
 
  • Like
Reactions: 1 user
Upvote 0

shadow

M50
Sep 20, 2022
107
31
Interesting that this gathered so little attention here.
I didnt see your thread until today, but posted about AI Mid Journey before.


so thought to add in more news just read in DPreview- Adobe was exposed for installing an Opt out button as they are training AI from their "creative cloud" and by default all users must opt out manually as like the usual surveillance and cloud scraping done by Apple, Google, etc must be opted out (opt in is by design and default) if you want your stuff not shared and private meanwhile how long has Adobe been doing this and have they already stealthly taken your adobe lightroom collection and absorbed it into their machine????

In the next 5 years, the AI will become to replace many humans in any "procedururally" driven job. If a task can be defined with most decision tree variables answered, why hire humans??? lol. Fortunately, blue collar jobs cannot be replaced with robots, and no robots to unplug your sewer drain.

ChatGPT at OpenAI has been really taking off for answers, but its still "learning" and answers if subjective question, are of course inaccurate. Google wrote they consider it a threat to their multi billion dollar business, but they have their own AI. Anyone with kids going to college or mid level management might look ahead and reconsider learning something new.
 
Upvote 0

koenkooi

CR Pro
Feb 25, 2015
3,575
4,110
The Netherlands
[...]
So thought to add in more news just read in DPreview- Adobe was exposed for installing an Opt out button as they are training AI from their "creative cloud" and by default all users must opt out manually as like the usual surveillance and cloud scraping done by Apple, Google, etc must be opted out (opt in is by design and default) if you want your stuff not shared and private meanwhile how long has Adobe been doing this and have they already stealthly taken your adobe lightroom collection and absorbed it into their machine????
[...]
Adobe has since clarified what that checkbox does: it allows you to opt-out of having your images used to train their tools, like content-aware fill and automatic subject masking. It is not used for 'generative AI'. According to Adobe that opt-out button has been there for several years.
In the explanation of what that toggle does, Adobe claims that only content that has been uploaded to their cloud is being using, Adobe explicitly denies using (or uploading) any of your local content.

While I disagree with the default being "yes, use my data", this deals with images people have explicitly chosen to upload to Adobes cloud. You cannot and should not have any expectation of privacy if you send unencrypted things to someone else's computer.
 
Upvote 0