I implemented Rosca, Mihaela, et al. "Variational Approaches for Auto-Encoding Generative Adversarial Networks" using Pytorch. It's a modular implementation -- plug in any torch modules as encoder, generator, discriminator and code discriminator.
These are two animations I made for FS35 with Jodie Mack in winter 2015.
Bear Cult was inspired by a collection of Joseph Campbell's essays I found in a used book store, specifically one entitled "Renewal Myths and Rites of the Primitive Hunters and Planters". It's about the prehistoric origins of myth. In Campbell's telling, the purpose of myth is to conquer death. Campbell cites preserved arrangements of cave bear bones as evidence for the ancient roots of a bear-baiting ritual still practiced by indigenous peoples in northern Japan. In the bear-baiting ritual, a young bear is captured, raised in captivity, and then ritually tormented, killed and eaten. The bear is believed to contain the soul of a demigod which yearns to be released from its fleshy bear-prison. For Campbell, this is the coping mechanism of an animal which kills to survive but understands death. It's the hunter's moral justification for killing: death isn't terminal, killing is a kindness. Ritualized death is a point of contact with the numinous; the fear of death is transmuted to awe. I found the deep connection Campbell makes between this sacred feeling and human capacity for cruelty heartbreaking.
I decided to depict a bear-baiting ritual. Not the specific cultural practice Campbell recounts, but the abstracted elements of a hypothetical prehistoric bear cult. The body of the bear would be made by hand with paper drawings and cutouts, while the soul would be made in software with virtual video feedback. These material and digital components would interact via fluorescent orange paper which could be keyed out in video editing software. A possible alternate title: "Mind-Body Chromakey". Originally I had storyboarded a longer narrative including human hands and spears piercing the bear and the escape of its soul, but only part proved feasible in the time I had.
I'm happy with how the bear soul material turned out. After shooting the paper animation, I exported just the key as a binary mask. As an input to the feedback process, the mask lets the roiling cloud-forms conform to the geometry of the bear eyes and nose-hole. Unfortunately the black-on-black paper was unwise; there's a lot of camera noise after adjusting the levels to make everything visible. A darker background of black felt or something might have worked better.
This project taught a lesson about how sound influences the perception of time in film. What felt like a good visual rhythm when I was silently animating seems weirdly abbreviated with the soundtrack added. The sound was a bit rushed, but I like the general effect. It's some recordings of bear vocalizations and carcass-mangling sounds chopped up into phase-rhythms with the same methods I used for SIBA. I'd like to revisit this project someday.
HOFSTADTERPILLAR is a many-looped negotiation between rules, materials and intuition, an effort to explore Hofstadter's idea of strange loop by drawing.
Drawing is an act of alternating abstraction and reification, recognition and construction of forms among materials. I draw, I see what I have drawn, I draw more. Animating by hand can be seen as a sort of image feedback loop; each frame is drawn with reference to previous frames, which are visible beneath transparent cels. I developed loose sets of rules to draw by, of the type "a silver line extends and curves more each frame". Other rules emerge from the materials; paint markers pool and smear when the next cel is placed on top. I used the "maximum cycle" technique of repeatedly drawing new material into the same set of frames to construct rich and evolving loops. The sound is another set of feedback loops, a Max doodle with chaotic noise generators stimulating banks of tuned filters.
In 2014 I spent a lot of time working through a Steve Reich fascination by making tape music. This piece was eventually presented at ICMC 2015. SIBA I|II|III stands for "Studies In Being Alive One Two and Three", which was a joke about the lack of purpose I felt at the time and also a genuine attempt to describe what I was groping toward, stylized to reflect the asemic repetitious sound-destruction therein. I don't know how to feel about it.
Douglas Hofstadter's Gödel, Escher, Bach contains about a hundred captivating ideas, and just one of them is the concept of video feedback: connect a video camera to a screen, and then point it at the screen. Weird stuff happens on the screen as light is projected though the air, captured by the camera and circled back to the screen, mutating each time. Zoom out to get a infinite hall of mirrors effect, zoom in to get kaleidoscopic pulsations, loosen the focus to get bubbling slime.
During college I became interested in computer graphics. Particularly, the idea of procedural graphics: producing images with only code, as opposed to virtually drawing (as with photoshop) or digitizing reality (as with a digital camera). Procedural graphics can translate some of the tantalizing infinites of mathematics into rich sensory experiences! A brief article by Íñigo Quílez introduced me to the idea of turning a random number generator into an endless alien terrain. One popular way to do this is with fragment shaders, specialized programs for doing pixel-by-pixel graphics with the GPUs found in most computers. For more than a few examples, see Shadertoy.
You can put a fragment shader into feedback just like a video camera. Pixels go in, pixels come out, pixels go back in 60 times per second. The shader determines how colors mutate and interact with nearby colors each time. Here are a few of mine:
Other things which resemble digital video feedback:
- 2D Cellular automata like Conway's game of life
- Finite element physical simulations of fluids and reaction-diffusion systems
- Convolutional neural network visualization techniques like deep dream
Other people working with this kind of stuff:
I just can't stay away. See projects such as ABSTRACT/CONCRETE, Video Synthesis With Convolutional Autoencoders and Data Sonification Using a Cortical Representation of Sound.
In spring of 2015 I made a synthesizer called Bendy as a seminar project. A technical description and a Max/MSP implementation can be found on GitHub. Here's what is sounds like:
Years ago I got one of these cheap mixers so I could record and amplify dorm room nonsense music sessions. But in grad school I had access to better recording facilities and the mixer was gathering dust. Also at that time, cool dude Carlos Dominguez introduced me to the concept of "no-input" mixing. Normally a mixer is the middleman between an instrument or microphone and a loudspeaker or recording device. It alters and facilitates sounds, but doesn't make sound. No-input mixing is total misuse of the mixer: you plug the mixer back into itself, as its own sole input. The mixer self-oscillates and makes its own sounds. It's the same principle as an electric guitar or microphone feeding back. What's funny and exciting about the no-input mixer is how rich and diverse it sounds given that it isn't supposed to sound at all. The whole point of a mixer is to have fine control over sound routing and equalization, so even a smallish stereo mixer has endless possible configurations. And because the oscillations depend on a delicate balance of amplification, no-input mixer is an enormously sensitive instrument. Playing one means moving a single knob so slowly to find the precise edge of chaos between two sounds; sweeping it so quickly to carve a tiny blip out of a squall; listening so closely to know when it's about to blow up. You learn the feel of a particular mixer, but it's still new every time. For some masterful no-input mixing, check out Toshimaru Nakamura.
As rich and surprisingly large as the space of feedback mixer sounds is, it's pretty distinctly within noise-drone-ambient world. You can make a lovely drone, you can make a wall of noise, you can make little squawking sounds, you can float through space. But I found myself yearning to hold on to one sound while searching for its perfect complement, or to condense a minute's worth of exploration into a short pattern, or to build whole stacks of sounds and rapidly switch between them. One option would be to ravenously seek more channels, but lacking a bigger mixer to abuse I turned to my old friend, computers. At first I just ran the mixer into Ableton. Simple stuff like grabbing a loop while continuing to tweak the mixer, or stacking a weird noisy tone into a weird noisy chord. Once the computer's in the loop, you can also expand the mixer's vocabulary by feeding processed sound back into the mixer. Slight frequency shifting of 0.1-10 Hz and reverb are particularly fertile. "Senseless Noise feat. Jeff" is an extremely goofy example of this setup, with me twirling the knobs and cool dude Jeff Mentch playfully tickling the default Ableton percussion synth. Also there are some blippy sine waves.
Using a big nasty rigid-yet-complex GUI like Ableton with the no-input mixer felt all wrong; it's nice for recording and has a powerful sampler, but it really spoils the elegance of the no-input mixer. And it demands the extra moving part of a MIDI controller or at least a mouse. As an alternative I've finally been getting into live coding. The usual way of programming involves writing code in a text file, and then running it all at once. Maybe the program halts and you inspect the result, or maybe it runs in a loop doing something until you close it. Only then can you can alter the code and try again. This is not an ideal paradigm for making music with code. A better one would be to have the program running as you write and revise it bit by bit, and have each change reflected in what you hear. Then instead of trying to program music ahead of time, the programming is the music. That's the idea of live coding. Some exponents of live coding are really into the performance angle, putting the programmer on a stage and projecting code for an audience to see.
SuperCollider is three things: a flexible software synthesizer scsynth, a scripting language sclang to control it, and a development environment scide. With SuperCollider, you build synthesizers as graphs of unit generators which do simple things like generate a sine wave, delay a signal, apply a filter. Sclang can compactly do things which are tedious in a program like Max or Ableton and impossible with hardware, like "make 100 versions of this sound and spread them across the stereo field" or "randomize all the connections between these synthesizers". It does come with a steep learning curve. There are always several programming paradigms and syntax options to choose from; the separation between scsynth and sclang takes a while to wrap your head around; scsynth and scide can be temperamental; errors can be uninformative. It's enough to drive you off when you're starting out and just want to make a weird chord out of 100 sine waves. Nevertheless, I've been getting the hang of it and having a blast dreaming up wacky delay effects to use with the mixer.
Tidal is complementary to SuperCollider. It doesn't deal directly with sound, but focuses on music as a sequence of discrete events. Like Ableton Live, it enshrines pulse: everything is a loop. Very unlike Ableton, it brings all the machinery of maximally-dorky pure functional programming to bear on the definition of musical patterns. Tidal is two things: a sublanguage of Haskell, and a runtime which maintains a clock and emits OSC messages. With Tidal, you write a simple pattern as something like
sound "bd sn bd sn" meaning "kick, snare, kick, snare". The part in quotes isn't a string, but Tidal's special pattern literal. You construct different rhythms using nesting and rests, e.g.
"hh [hh [~ hh]]". Immediately, I liked this much better than clicking around a piano roll (though it's obviously very different from playing on a controller). You can then pattern other parameters by adding something like
# pan "0 1" # gain "0.1 0.2 0.3". Each pattern can have its own meter. You can transform a pattern by e.g. changing the speed; you can superimpose or concatenate lists of patterns. And since Tidal is a language, you can do all that recursively, building complex patterns up of simple atoms, inventing and reusing processes. Tidal can send arbitrary MIDI and OSC, but it's really made to use with its own special sampler called Dirt (which has a SuperCollider implementation). I've been recording trajectories of a few minutes through computer-augmented no-input mixer space, then using Tidal+Dirt to chop them up into patterns. "Improv December 17" is made from one long mixer recording and a bass drum sample that comes with Dirt.
Using Tidal is a lot like programming in Haskell, because that's what it is. It feels natural to work with lists and higher order functions, but deeply weird when you want to do something in an imperative style. Haskell is sort of an odd match for live coding. In the middle of a groove, the last thing I want to worry about is whether to use
floor to make sure some number is the right type to play nice with a Tidal function. On the other hand, Haskell and the ingenious pattern literal make for an incredibly concise syntax. And though Haskell's strong type safety can add cognitive load I'd rather spend on musical matters, it also makes it harder to break: a mistake is more likely to be rejected by the interpreter (doing nothing) than to do something unexpected to the sound.
Tidal and SuperDirt are pretty experimental software, and there are some rough edges. Not all the documentation is there, and some things just aren't working for me. Sometimes samples don't fade in right and pop, the built in delay effect is acting bizarrely, and it can be awkward to deal with long samples. Right off the bat, I had to build the latest version to fix a fatal bug. There are some sampler features I miss like having expressive envelopes; I'd also like more flexible EQ. If I can get fluent at SuperCollider, I may try to implement some of these things myself.
So far all my SuperCollider and Tidal work is live-coded spaghetti, but eventually I hope to pack some of it into nice libraries. Stay tuned!
Side note: lately I've been using an Ubuntu desktop for music. To record, edit and master material coming out of SuperCollider, I tried out Ardour for the first time in years. I was impressed by everything until I discovered that automation curves wouldn't draw right. So close! I also tried out the free Calf plugins, which are very flexible and sound great. Seems they've been around for a while, but I never knew about them before. The multiband compressor and gate effects worked well for massaging the sound of a dense stereo recording.
This project was my attempt to incorporate a neural network trained to encode images into a video feedback process.
In spring of 2015 I took a seminar in deep learning which got me real excited. Machine learning is the study of general methods for problem solving using optimization and data. Deep learning is a particular approach to ML using models with many differentiable layers of parameters. Especially interesting to me was representation learning: using ML methods to extract meaningful features from "raw" data like the pixels of an image. And I'd been talking a lot to Parag Mital and Andy Sarroff about their respective work with deep learning, sound and video. But what freaked me out the most about deep learning was the similarity between neural networks and the audio/video feedback I'd been using to make noise.
The kind of digital video feedback I'd been playing with was superficially like a recurrent neural network. At each time step, the current frame of video would be computed from the last (and optionally, the current frame of an input video). There would first be some linear function from images to images, like translation or blurring; generally, each pixel would take on a linear combination of pixels in the last frame and input frame. Then, there would be some pixel-wise bounded nonlinearity to keep the process from blowing up, like wrapping around [0, 1] or sigmoid squashing. That's the architecture of an RNN. The only difference was that rather than represent the linear transformation as a big ol' parameter matrix, I would hand-craft it from a few sampling operations in a fragment shader. And instead of training by backpropagation to do some task, I would fiddle with it manually until it had visually interesting dynamics.
I might have stopped there and tried to make my video-RNN parameters trainable. But to do what? It was pretty clear I wouldn't make much headway on synthesis of natural video in two weeks, without experience in deep learning software frameworks, and without even a GPU to run on. I wanted a toy-sized problem which might still result in a cool interactive video process. So I came up with a different approach: rather than try to train a recurrent network I would train a feedforward convolutional network, then transplant its parameters into a still partially hand-constructed video process. I came up with a neat way to do that: my CNN would be arranged as an autoencoder. It would have an hourglass shape, moving information out of 2-D image space and into a dense vector representation (which I vaguely hoped would make the network implement a "hierarchy of abstraction"). This would mean that I could bolt an "abstraction dimension" onto the temporal and spatial dimensions of a video feedback process. The autoencoder would implement "texture sampling" from the "less abstract" layer below and "more abstract" layer above. Then I could fiddle with the dynamics by implementing something like "each layer approaches the previous time-step minus the layer above plus the layer below, squashed".
I almost bit off more than I could chew for a seminar project: my approach demanded that I design and train my own neural network with caffe and re-implement the forward pass with OpenGL and spend time exploring the resultant dynamics. I was able to train my autoencoders on CIFAR with some success, and I was able to make some singular boiling multicolored nonsense. But I didn't get the spectacular emergence of natural image qualities I hoped for.
Here's the GitHub, which includes a technical writeup, a jupyter notebook with the autoencoder experiments in it, and the (probably very brittle) source code for an openFrameworks app which runs the process interactively, optionally with webcam input. It's based on early 2015 versions of caffe and openFrameworks. I may still try to get the openFrameworks app running again and capture some video, for posterity.
A few months later deep dream came out. Deep dream does a similar thing: it iteratively alters an image using a pre-trained CNN to manifest natural image qualities. The trick to deep dream is that the mechanism is the same as training the network, optimizing inputs instead of parameters. Vanilla deep dream converges, but it's simple to make a dynamic version by incorporating infinite zoom or similar. Too bad I didn't get into the filter visualization papers for this project -- I failed to realize that backpropagation could do exactly what I wanted!