{"id":859,"date":"2021-08-24T14:18:10","date_gmt":"2021-08-24T14:18:10","guid":{"rendered":"http:\/\/liipetti.net\/visual\/?p=859"},"modified":"2021-08-25T09:16:34","modified_gmt":"2021-08-25T09:16:34","slug":"staying-visual-in-a-clip-world","status":"publish","type":"post","link":"https:\/\/liipetti.net\/visual\/staying-visual-in-a-clip-world\/","title":{"rendered":"Staying visual in a CLIP world"},"content":{"rendered":"\n<ul class=\"wp-block-gallery columns-1 is-cropped ht-narrow-gallery wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"592\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/k2006-391-300-kopio.png\" alt=\"\" data-id=\"798\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/k2006-391-300-kopio\/\" class=\"wp-image-798\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/k2006-391-300-kopio.png 800w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/k2006-391-300-kopio-300x222.png 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/k2006-391-300-kopio-768x568.png 768w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/k2006-391-300-kopio-150x111.png 150w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure><\/li><\/ul>\n\n\n\n<p><br>When CLIP based image synthesis started to emerge in January\/February 2021, I was torn between curiousity and perplexity. Suddenly it was possible to synthesise images from mere text prompts. On the one hand, I was really in need of something new to refresh my image creation process (mainly GAN based, see <a href=\"http:\/\/liipetti.net\/visual\/taming-the-gan\/)\">my article<\/a>). On the other hand I had moved into visual arts in order to take distance from the written word, which had always come to me so naturally. Turning to the visual meant for me a turn away from analytical thought, towards immediacy, feeling and intuition. Generating images from text ran counter to thinking visually, let alone feeling visually.<\/p>\n\n\n\n<p>Anyhow, I did my own experiments and gradually started to look for ways to use CLIP together with as much visual control as possible. First, this involved adding visual inputs to the process otherwise dominated by CLIP, later on even moving CLIP to a secondary role in a mainly visual process.<\/p>\n\n\n\n<p>An obvious way to achieve visual control is to start the process from a given image. This already, together with a moderate learning rate and other settings, is enough to keep CLIP close to the original image while developing it somehow into the direction given in the prompt. In the following video, the starting images were black and white photos taken on film.<br><br><\/p>\n\n\n<p><iframe loading=\"lazy\" title=\"cliv180521b-Antilalias MPEG-4 720p.mp4\" src=\"https:\/\/player.vimeo.com\/video\/591593881?dnt=1&amp;app_id=122963&amp;h=6f6a50e973\" width=\"400\" height=\"225\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n\n\n\n<p><br>Alternatively, especially when there is no direct way to initialise the image generator with a given image, one can have an objective function to evaluate the distance from the original image, too, so that the process  will seek a balance between textual and visual control.<\/p>\n\n\n\n<p>For someone like me, with a preference for transformative techniques (see <a href=\"http:\/\/liipetti.net\/visual\/transformative-and-generative-in-neural-art\/\">my article<\/a>) above purely generative, using CLIP in combination with my own image materials turned out, at times at least, to excel in style transfer. Even if the style most often did not feel my own. Still doesn&#8217;t.<br><br><\/p>\n\n\n\n<ul class=\"wp-block-gallery columns-2 is-cropped ht-narrow-gallery wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/198695210_10159533830233729_2033333406138993092_n.jpg\" alt=\"\" data-id=\"784\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/198695210_10159533830233729_2033333406138993092_n\/\" class=\"wp-image-784\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/198695210_10159533830233729_2033333406138993092_n.jpg 800w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/198695210_10159533830233729_2033333406138993092_n-150x150.jpg 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/198695210_10159533830233729_2033333406138993092_n-300x300.jpg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/198695210_10159533830233729_2033333406138993092_n-768x768.jpg 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E3nRBVIWEAAEy-s.png\" alt=\"\" data-id=\"783\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/e3nrbviweaaey-s\/\" class=\"wp-image-783\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E3nRBVIWEAAEy-s.png 800w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E3nRBVIWEAAEy-s-150x150.png 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E3nRBVIWEAAEy-s-300x300.png 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E3nRBVIWEAAEy-s-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure><\/li><\/ul>\n\n\n\n<p>Here, I used a set of photos taken at home, already originally meant to look a bit like landscapes, developed further in that direction using CLIP. Still these, like the examples above, are mere experiments to me.<br><br><\/p>\n\n\n<p><iframe loading=\"lazy\" title=\"outofwhatever\" src=\"https:\/\/player.vimeo.com\/video\/591602805?dnt=1&amp;app_id=122963&amp;h=1b279140fc\" width=\"400\" height=\"225\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n\n\n\n<p><br>I have also experimented using CLIP with other types of image generators besides the now ubiquitous VQGAN, including hooking CLIP with my own GANs. Yet it was a totally untrained GAN generator that gave me the most interesting results. Here, the generative capabilities are very limited, still CLIP is trying to push it further, resulting in a special style. Here I feel that I am coming to my own ground, I can relate to these images in a personal way, even as my own works.<br><br><\/p>\n\n\n\n<ul class=\"wp-block-gallery columns-3 is-cropped ht-narrow-gallery wp-block-gallery-3 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fJM4eXMAMXLDV-1024x1024.jpeg\" alt=\"\" data-id=\"801\" data-link=\"http:\/\/liipetti.net\/visual\/e2fjm4exmamxldv\/\" class=\"wp-image-801\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fJM4eXMAMXLDV-1024x1024.jpeg 1024w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fJM4eXMAMXLDV-150x150.jpeg 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fJM4eXMAMXLDV-300x300.jpeg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fJM4eXMAMXLDV-768x768.jpeg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/iniz-60-1024x1024.jpg\" alt=\"\" data-id=\"802\" data-link=\"http:\/\/liipetti.net\/visual\/iniz-60\/\" class=\"wp-image-802\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/iniz-60-1024x1024.jpg 1024w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/iniz-60-150x150.jpg 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/iniz-60-300x300.jpg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/iniz-60-768x768.jpg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/inic-50-1024x1024.jpg\" alt=\"\" data-id=\"803\" data-link=\"http:\/\/liipetti.net\/visual\/inic-50\/\" class=\"wp-image-803\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/inic-50-1024x1024.jpg 1024w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/inic-50-150x150.jpg 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/inic-50-300x300.jpg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/inic-50-768x768.jpg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fEdPNX0AEG7cX.jpeg\" alt=\"\" data-id=\"800\" data-link=\"http:\/\/liipetti.net\/visual\/e2fedpnx0aeg7cx\/\" class=\"wp-image-800\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fEdPNX0AEG7cX.jpeg 1024w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fEdPNX0AEG7cX-150x150.jpeg 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fEdPNX0AEG7cX-300x300.jpeg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/E2fEdPNX0AEG7cX-768x768.jpeg 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><\/ul>\n\n\n\n<p><br>I then realised that the simple &#8220;input text, wait, get an image&#8221; approach was tedious and restrictive. What if CLIP were used in an interactive session, allowing you to watch the image evolve while having control over it all the time. Adding a new seed\/target image, applying masks to develop only a part of the image at a time, changing the prompt on the fly, and so on. So I made Picsyn, and it was a totally different experience. I could start from an image with a prompt, then at some point make it develop into more ice-like, then towards Giacometti, not all the way but just a bit. The possibilities felt endless and they still are, even if the process is still quite simple and straightforward. It can yield quite satisfying images, finally, but also function in a realtime performance in which the image is constantly evolving under realtime control. The example below is purely experimental, but I am sure that with this interactive process, I would be able to make also works that are really my own. <br><br><\/p>\n\n\n<p><iframe loading=\"lazy\" title=\"wfef\" src=\"https:\/\/player.vimeo.com\/video\/591597380?dnt=1&amp;app_id=122963&amp;h=732ec8f591\" width=\"360\" height=\"360\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n\n\n\n<p><br>Eventually, I want to tame CLIP into just another tool in my toolbox. I have resumed my GAN based work, even made some progress to get it working better than before. I have experimented with using CLIP both in pre- and post-processing the images in an otherwise GAN-based process. Modifying the dataset images, maybe ever so slightly, and maybe smoothing out the GAN output, but only if it works in the context.  With this process, with images like those shown below, I am squarely within my own territory, having enough visual control and the resulting style is my own.<br><br><\/p>\n\n\n\n<ul class=\"wp-block-gallery columns-2 is-cropped ht-narrow-gallery wp-block-gallery-4 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltmix000d-artistmaltin.png\" alt=\"\" data-id=\"815\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/maltmix000d-artistmaltin\/\" class=\"wp-image-815\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltmix000d-artistmaltin.png 800w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltmix000d-artistmaltin-150x150.png 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltmix000d-artistmaltin-300x300.png 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltmix000d-artistmaltin-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltinr002e.png\" alt=\"\" data-id=\"816\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/maltinr002e\/\" class=\"wp-image-816\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltinr002e.png 800w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltinr002e-150x150.png 150w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltinr002e-300x300.png 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/maltinr002e-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure><\/li><\/ul>\n\n\n\n<p><br>Another way of using CLIP output as raw material is mixed media and collage. In the following picture I have used CLIP in an interactive process to make an image, made a print on fine art paper and then attached a black and white large format (4&#215;5&#8243;) film negative on top of the image. Here, again, I am within an experimental domain, very promising though, again something I can relate to, have enough control.<br><br><\/p>\n\n\n\n<ul class=\"wp-block-gallery columns-1 is-cropped ht-narrow-gallery wp-block-gallery-5 is-layout-flex wp-block-gallery-is-layout-flex\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"http:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/DSC05539-2-1024x683.jpg\" alt=\"\" data-id=\"839\" data-link=\"http:\/\/liipetti.net\/visual\/keeping-visual\/dsc05539-2\/\" class=\"wp-image-839\" srcset=\"https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/DSC05539-2-1024x683.jpg 1024w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/DSC05539-2-300x200.jpg 300w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/DSC05539-2-768x512.jpg 768w, https:\/\/liipetti.net\/visual\/wp-content\/uploads\/2021\/08\/DSC05539-2-150x100.jpg 150w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>When CLIP based image synthesis started to emerge in January\/February 2021, I was torn between curiousity and perplexity. Suddenly it was possible to synthesise images from mere text prompts. On the one hand, I was really in need of something new to refresh my image creation process (mainly GAN based,&#8230;<\/p>\n<p class=\"continue-reading-button\"> <a class=\"continue-reading-link\" href=\"https:\/\/liipetti.net\/visual\/staying-visual-in-a-clip-world\/\">Continue reading<i class=\"crycon-right-dir\"><\/i><\/a><\/p>\n","protected":false},"author":1,"featured_media":816,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-859","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-working"],"_links":{"self":[{"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/posts\/859","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/comments?post=859"}],"version-history":[{"count":8,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/posts\/859\/revisions"}],"predecessor-version":[{"id":868,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/posts\/859\/revisions\/868"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/media\/816"}],"wp:attachment":[{"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/media?parent=859"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/categories?post=859"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/liipetti.net\/visual\/wp-json\/wp\/v2\/tags?post=859"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}