In my two previous posts, I experimented with neural-style by taking the fully connected layers into use. This resulted in something quite different, which I have provisionally called neural-mirage. Neural-mirage looks at the uppermost fc layer, the most abstract classification of what the network thinks it sees in the image, and then it creates a new image using the elements of the original image. The result is akin to breaking the original image into puzzle pieces, assembling the pieces in random order, while getting the pieces fit seamlessly. In addition, the image will contain details not present in the original but which the neural network thinks to match the contents of the image. Like in “In der Altstadt”:
Applying this technique on several images reveals, though, that it may also be a handicap that neural-mirage totally loses the spatial arrangement of the original image. Like when the sky above disappears and returns as small spots here and there in the result image.
To retain some of the original spatial arrangement, I made further experiments that include the conv layers in the content loss calculation. So, in effect the solution is steered by three factors:
- The spatial contents of the original image, as represented by one or more conv layers.
- The abstract contents of the original image, as represented by the fully connected layers. This factor does not include any information on the spatial arrangement of the original image.
- The style elements of the original image, as represented by the Gramian matrices on top of selected conv layers (just like in neural-style and the original paper by Leon Gatys et al). Note also that here the style elements are derived from the same image as the content.
Furthermore, the weight of each of these three factors is adjustable. If the spatial content is emphasized too much, the resulting picture becomes an almost photorealistic copy of the original:
On the other hand, if the spatial content is given too little weight, the image will be filled with details to a significant degree independent of the layout of the original image.
Looking for the balance is quite critical.
Interestingly, the neural network may replace the dome of Sacre Coeur in the background with a cone:
Or replace the church altogether with towers, of a fortress maybe?
All these images have been created running only 100 to 300 iterations. It is also interesting to monitor the images created early in the iteration chain… these sometimes obtain a quality of a painting:
There is much room for experimentation and so far I have only covered a small part. For instance, in the above examples the spatial control has been taken from conv1_1, the lowest conv layer where information is still very close to the actual pixel content. Moving to the higher conv layers the information becomes more abstract, representing, for instance, lines and shapes. Using conv5_1 to control spatial content can give an image like this.
The original spatial layout is still there, but the actual shapes of the original have been replaced with other shapes. The change from conv1_1 to conv5_1 is more drastic than I would have expected. Perhaps this is because now there is nothing now to steer the image towards the original pixel content. The network is then free to paint using whatever it has learned while looking at hundreds of thousands of images.
So far I have no used any style image, all style information has been extracted from the same image as the content. It is, of course, fully possible to use additional style images too, further increasing the number of options and adjustments.