The following instructions are for creating your own animations using the style transfer technique described by Gatys, Ecker, and Bethge, and implemented by Justin Johnson. To see an example of such an animation, see this video of Alice in Wonderland re-styled by 17 paintings.
The easiest way to set up the environment is to simply load Samim's a pre-built snap or use another cloud service like Amazon EC2. Unfortunately the g2.2xlarge GPU instances cost $0.99 per hour, and depending on parameters selected, it may take 10-15 minutes to produce a 512px-wide image, so it can cost $2-3 to generate 1 sec of video at 12fps.
If you do load the Terminal snap, make sure to run git pull
to update to the latest version of neural-style.
Alternatively, you can set it up on your own computer, though the process is a bit tedious (especially installing Cuda) and a laptop generally lacks the VRAM to load the dataset without running out of memory. To set it on your own computer, best is to go with the concise instructions found here.
Once you have gotten neural-style setup, download the python script attached to this gist,
, and install Pillow by running:
pip install Pillow
The script
is used to blend each input image with the previous generated image. This helps to tease out the same features in successive frames and improve the consistency/smoothness of the frames. There is likely a much more effective way to do this, but blending is a cheap and easy solution to noisy features.
Next, install ffmpeg. Easiest is to just run brew install ffmpeg
if you have homebrew installed.
Choose a source video file, e.g. myMovie.mp4, and use ffmpeg to extract the frames and raw audio from it. You can use any framerate you like, but I found it better to keep the framerate low, e.g. 12fps, to reduce noisiness in the output frames, and it also reduces the amount of computation needed.
ffmpeg -i myMovie.mp4 -r 12 -f image2 image-%5d.jpg
ffmpeg -i myMovie.mp4 rawaudio.wav
Now the fun part: choosing style images to transform the content. In principle, any style image works, but the best style images to use are those which have strong textural components and patterns. A good place to start for ideas is Kyle McDonald's style studies which contains several dozen examples of style images picked from western art history.
Additionally, the neural-style repository itself contains examples on the README, as do the other implementations of the algorithm, including those by Kai Sheng Tai, and Anders Larsen. Lastly, many people have been posting images on twitter with the hasthag #stylenet, and the twitter account DeepForger has many examples as well.
Once you've selected a style image, e.g. myStyle.jpg, you can generate an initial image from your first frame (image-00001.jpg) using neural_style.lua
th neural_style.lua -style_image myStyle.jpg -content_image image-00001.jpg -output_image generated-00001.jpg
This uses the default parameters which work reasonably well, but you may try experimenting with the other parameters, which help to emphasize the content or style more, as well as using the style_scale factor to control the scale at which the network samples from the style image.
See the README at neural-style for more info about parameter selection.
Once you have your initial output frame, rather than generating the second frame directly from the second input frame, I found it was useful to partially blend the previous generated image into the next input frame, perhaps at no more than 5%. This helps to tease out the same high-level features in the second frame as were discovered in the first, and makes the video smoother and more consistent in high-level features (low-level features are still fairly noisy).
So blend the output into the next input, and then generate the next output from the resulting blended image. This example uses an alpha value of 0.95 of the second input frame (5% blend of the first frame.
python --input_1 generated-00001.jpg --input_2 image-00002.jpeg --output blended-00002.jpg --alpha 0.95
th neural_style.lua -style_image myStyle.jpg -content_image blended-00002.jpg -output_image generated-00002.jpg
Repeat this process for as many frames as you have. Easiest to make a bash script to automate it.
Update: neural-style now supports setting a random number seed in Torch which should improve frame-to-frame consistency, potentially reducing or eliminating the need for the blending step.
Now that you have all the frames, you can use ffmpeg to create the generated movie from the generated frames and the raw audio initially extracted from the source video.
ffmpeg -framerate 12 -i frame-%05d.png -i rawaudio.wav -c:v libx264 -pix_fmt yuv420p
And voila, style-transfer video.