Using nin-imagenet-conv in neural-style

Many people appear to have problems getting neural-style to run, and this is often related to the fact that they try to use it with limited memory such as 2GB. My own tests indicate that neural-style, when used with the default VGG19 network, requires at least 3 GB memory. Lowering image_size does not help, as there is a memory peak during initialization which appears to be unrelated to the image_size. My tests have been conducted using CPU, so results may vary when using a GPU, but in practice it still seems getting neural-style with VGG19 to run with only 2GB memory does not work.

There is an alternative to VGG19, nin-imagenet-conv, which runs quite comfortably even under 2GB of memory. You can download the files from . Download nin_imagenet_conv.caffemodel and train_val.prototxt into your models/ folder.

You can then run neural-style as follows (adjust the -gpu parameter if you want to run using GPU):

th neural_style.lua -gpu -1 -print_iter 1 -save_iter 50 -num_iterations 2000 -image_size 512  -output_image nintest.png -model_file models/nin_imagenet_conv.caffemodel -proto_file models/train_val.prototxt -content_layers relu7 -style_layers relu1,relu3,relu5,relu7,relu9

On my CPU, this will only use around 0.8GB reserved memory and produces the following image in 1000 iterations.


Increasing image_size to 960 still runs in under 3GB and produces the following picture in 1000 iterations.


The content and style weights can be changed as usual. Likewise one can experiment with the layers: relu1,relu2,relu3,relu5 (no relu4),relu6,relu7,relu8,relu9,relu10,relu11,relu12. Running the first try again, image_size=512 but content layer relu3, we get in 1200 iterations while using only slightly more than 0.8GB memory.


Many different results are possible by tweaking the parameters. The results will not be identical to those achieved with VGG19 but they should still be interesting and useful.

out_nin800c7s1279_c10_s4800_1000 out_nin800c7s1279_c50_s4800_550 out_nin800c7s1279_c50_s900_1000 out_nin512c7s1279_c50_s300_1000

For more examples using nin-imagenet-conv, see my earlier post Switching to a smaller net









  1. Pingback: Ubuntu DeepStyle “Neural Paintings” – GandalfTechSomething

  2. Hello!
    Very interesting but could you please explain why “no relu4”?
    Why do you not advise to use relu4?

    And what about relu0? You didn’t say about it.

    And the last question, will the quantity of used layers give more quality?
    May I use -content_layers relu0,relu1,relu2,relu3,relu5,relu6,relu7,relu8,relu9,relu10,relu11,relu12 -style_layers relu0,relu1,relu2,relu3,relu5,relu6,relu7,relu8,relu9,relu10,relu11,relu12

    • I don’t know why the model has no layer named relu4. It is up to whoever made the model to name the layers.

      Then as to whether including more layers gives more quality. In general the lowest layer contains actual pixel data and the upper layers represent more abstract features like lines and curves, with increasing complexity. So one might say that the lower layers represent the actual pixels and shapes, the uppermost layers represent more what the neural network thinks it sees in the picture. The lower layers keep faithful to the source image, the higher layers might add something of a creative element based on the material on which the neural network has been trained.

      How this affects the resulting image can be found out by trying. Yes, you can try all those, I don’t remember if there is relu0 but I guess so. Have a look in the prototxt file, all layers are described there.

      One additional thing to note when starting to experiment. Have a look at the -print_iter output when experimenting. You may notice that the loss values for higher layers are quite small. It could mean that their effect is smaller. If you include all of them the lower ones will probably dominate.

      Neural-style does not allow to use different weights for different layers (although I have been thinking of making a modified version to try that). But if you really want to test what each layer contributes, I would suggest you try to use only some of the layers in isolation, increasing the weights to compensation the decrease when using a higher layer.

      • One more thing. Neural-style creates the output image by trying to find a minimum for the combined loss (all content and style layers you have selected). The different layers can be pulling to different directions. In fact, the content and style layers are indeed pulling to different directions, the ones favoring the original image and the others favoring the stylistic features (color and texture) in it. But it may also happen that different style layers (one from the bottom and one from the top) start fighting each other.

        Fortunately, one can monitor what is happening using -print_iter 1.

      • Thanks so much! I have looked in the prototxt file and found relu4 doesn’t exist!
        If i use -content_layers relu0,relu1,relu2,relu3,relu4,relu5,relu6,relu7,relu8,relu9,relu10,relu11,relu12
        then the program doesn’t finish with error but it uses only relu0, relu1, relu2, and relu3 only! Unserstood.
        What I want to know. If we use in content_layers only relu7 in one experiment and then only relu3 in another experiment – we use very different levels of abstraction! So the resulting images must be very very different. But they are not very different so much…
        And what is the benefit of using not all layers? I see that using more layers increases visually visible weight (of content or style) and it can be compensated by lowering weights. But if we omit some layers – we omit some abstractions. Is it good?

        • Is it good? That is completely up to what you are after, and there is no way other than experimenting to find out.

          As to “very different levels of abstraction”, I think that the level changes very little from one layer to the next. That is also why we don’t need to use all layers all the time. To get very different results one should start by comparing the opposites, relu1 and relu12, for instance, and adjust weight to keep the losses at roughly the same level.

          The abstraction is not necessarily what one is after in quality, either. Taken to the extremes, the network starts to draw shapes which aren’t at all in the original (like in Google’s inceptionism). See also my recent posts where I have modified neural-style to use the most abstract fc layers.

          When experimenting, it may help to change only one parameter at a time (except possibly adjusting weights when changing layers). So I usually keep to a single content layer when trying out style layers and vice versa. Note also that the defaults for VGG19 use only a single content layer while having multiple style layers. That may be a good practice.

  3. But nin-imagenet-conv is no direct replacement, it produces different-looking results and requires tweaking the settings.

    • Who said that nin-imagenet-conv would produce identical results to VGG19-ILSVRC? Neural-style works with many different models and they make different looking images and respond differently to the settings. That is also what my post is about.

      Why is that? Neural-style is not a algorithm, really, but a system that uses a trained model to perceive what each is about, and to use that perception to create a new image. Each model, trained with different materials, responds differently, just like two human beings respond differently to images.

      One great thing about neural-style is the possiblity of using several different models, and also, eventually, training one’s own. For anyone who wants what VGG19-ILSVRC produces, just use that model, and if you want larger images, get more memory.

Comments are closed