Neural art: some practical problems
In another article, I have introduced a distinction between transformative and generative in neural art. I will now continue to review the practical issues associated with these two kinds of processes.
My experience is that transformative processes, both style transfer and pix2pix with its derivatives, are very stable and easy to handle. In style transfer, you have to get familiar how the parameters affect the process, but you usually get results anyway and it is possible to learn from experience. Pix2pix too, even if it is GAN based, works very reliably and does not require extensive searching for correct settings to make it learn the transformation.
With generative GANs, the situation is very different. It is quite difficult to get a GAN trained to produce images of useful resolution (especially 512px and higher). I have been working eight months now on my own GAN code, trying out different techniques. There have been successes, but also a lot of frustration. There are so many different factors, and it appears that the path to successful training is very narrow. Training a GAN requires a lot of time and computing resources, leading to further frustration whenever the training attempt is unsuccessful. Also, even if the training is technically successful, the resulting images might not match what you want to achieve.
Of course there are now GAN architectures and techniques which can produce quite good image quality at reasonable resolutions, but these tend to be rather expensive to use in terms of computing power, time, memory and power consumption. Quite often beyond reach of an ordinary individual artist on a meager budget.
For an individual artist, starting to explore neural artmaking, the transformative techniques are usually much more effective, unless one really seeks to train a generator to mass-produce images. But that would be even more difficult, due to an effect called mode collapse, meaning that the GAN fails to produce significant variety.
Nevertheless, generative GANs are an important and fascinating part of neural image-making, and I do not consider the time and effort I have used with them wasted. Yet, looking at the successful results from my GAN work, it becomes evident that even they bear a mark of the transformative process. How is that possible? The answer is likely to be the phenomenon called overfitting: the neural model has learned, at least to some extent, to memorise images in the learning set instead of learning to generalise.
So here also, I am really balancing between the generative and the transformative. I do not really want the generator to produce totally independent representations, and stylistically, I do not want the photorealistic quality of the learning dataset either. When everything goes well, unfortunately quite seldom, the GAN will produce interesting variations based on my photo set in a style that I accept. A synthetic style which is not based on any models but instead a result of imperfect learning within the GAN.
Somewhat paradoxically, then, I find myself using generative GANs in a somewhat transformative way. Yet, different from working purely transformatively, when I can directly proceed to express myself, with a generative GAN the emphasis always first is on exploration, embarking on a journey without knowing what, if anything, there is to find.