Transformative and generative in neural art

When I started experimenting with image processing using neural networks in September 2015, style transfer was the word of the day. Justin Johnson had just released his implementation called neural-style, which allowed applying the style of an image (style image) to another image (content image). Despite the name and the hype about the method, style transfer did not really understand style in a general sense, it simply detected features of color and texture in a single image, and recreated the content image to match these features. Thus, unfortunately, the term style transfer became associated with this limited method, instead of being understood as a generic term for analysing and producing a specific style.

For the next year or so, this limited style transfer dominated the neural art scene, and found its way into popular use with instant gratification apps like Prisma. The resulting proliferation of cliché looking images in social media contributed to style transfer getting a bad name among many who took AI art seriously.

Somewhere late 2016 generative adversarial networks (GANs) emerged to the scene and gradually became dominant. In principle, GAN is a technique used to train a generator (e.g. to generate totally new images resembling those used in training) by having another neural network to evaluate the whether the generated images are good enough. Both networks start from scratch and for the training to succeed, they have to learn gradually at the same pace.

The idea of the GAN is, thus, to train a generator which can create new images from scratch. You feed in a series of random numbers (usually called latent vector) and you get a new image. Therefore, it would appear that in contrast with style transfer, which is used to transform an image, GANs are generative. In general, the processes in neural can be divided into transformative (going from an image to another) and generative (generating new images from scratch).

It however turns out that many very successful GANs are in fact used transformatively: pix2pix learns a transformation from a set of image pairs, its further developments CycleGAN and BicycleGAN, and a recent, very promising project called famos. Much of my own work is based on pix2pix; in the NIPS online gallery, three of the four works are done with pix2pix. Often, I first reduce photos to woollen thread like contours, then train pix2pix to fill them again with colour. The resulting effect is, however, far from realistic.

In another method, I use a program to apply ”scratches” to photos, then train pix2pix to do the same, and again the resulting effect is different from the targets: the colour bleeds and blends in an interesting way.

In both cases, I use a GAN to apply a style without any style model, so it is not style transfer. Somehow the style is created in the process, perhaps due to the limitations and imperfections of training the GAN. The process is clearly transformative, not generative, as the resulting image retains enough of the original representation and it is not possible with this process to produce images out of scratch (no pun intended).

To recap, a transformative process is used to make a new image based on an existing one. A generative process produces images without any model or referent to represent, out of what it has learned and how it has been programmed.

There is a point to linking these categories to the intention of an artist. A transformation usually involves a human actor, and the act of transforming an image proceeds from her intention. On the other hand, the audience may not be able to say whether a work has been produced by transformation or generation. This is somewhat similar to the use of electronic musical instruments: the performer accustomed with an acoustic accordion tends to feel that the electronic accordion does not respond to her playing, while the listener notices nothing of this and may be fully satisfied with the music. The distinction between what and how may therefore be irrelevant to the audience, however crucial it may have been to the artist at work. My focus here, however, is mainly the viewpoint of the creative actor, for whom this distinction is and remains valid.

It seems to me that a process based on transformation places emphasis on expression, while a generative system involves exploration. In the first, an artist has the means for expression, in the second, one must enter into exploration, trying to find and select images which appear to make sense or appeal to the senses. This becomes very clear with BigGAN: you never know what you will get, you have to find out by exploring. They say there are more possible images there than atoms in the universe. On the other hand, after you have seen a lot of them, you’ll probably agree that they tend all to look alike regardless of what they represent.

We should not, however, equate generation with automation. It should be possible, at least in principle, to build an automated process of producing art based on transformation, which would capture images, transform and curate them. The essential difference thus is between a process based on an existing representation and a process of producing totally new ones. The disticntion between generative and transformative appears to remain valid also in case of artificial creative actors.