Data Sonification Using a Cortical Representation of Sound

2017-01-09 00:00

Making Music With No-input Mixer, SuperCollider, and Tidal

2017-01-09 00:00

No-input Mixer

Years ago I got one of these cheap mixers so I could record and amplify dorm room nonsense music sessions. But in grad school I had access to better recording facilities and the mixer was gathering dust. Also at that time, cool dude Carlos Dominguez introduced me to the concept of "no-input" mixing. Normally a mixer is the middleman between an instrument or microphone and a loudspeaker or recording device. It alters and facilitates sounds, but doesn't make sound. No-input mixing is total misuse of the mixer: you plug the mixer back into itself, as its own sole input. The mixer self-oscillates and makes its own sounds. It's the same principle as an electric guitar or microphone feeding back. What's funny and exciting about the no-input mixer is how rich and diverse it sounds given that it isn't supposed to sound at all. The whole point of a mixer is to have fine control over sound routing and equalization, so even a smallish stereo mixer has endless possible configurations. And because the oscillations depend on a delicate balance of amplification, no-input mixer is an enormously sensitive instrument. Playing one means moving a single knob so slowly to find the precise edge of chaos between two sounds; sweeping it so quickly to carve a tiny blip out of a squall; listening so closely to know when it's about to blow up. You learn the feel of a particular mixer, but it's still new every time. For some masterful no-input mixing, check out Toshimaru Nakamura.

As rich and surprisingly large as the space of feedback mixer sounds is, it's pretty distinctly within noise-drone-ambient world. You can make a lovely drone, you can make a wall of noise, you can make little squawking sounds, you can float through space. But I found myself yearning to hold on to one sound while searching for its perfect complement, or to condense a minute's worth of exploration into a short pattern, or to build whole stacks of sounds and rapidly switch between them. One option would be to ravenously seek more channels, but lacking a bigger mixer to abuse I turned to my old friend, computers. At first I just ran the mixer into Ableton. Simple stuff like grabbing a loop while continuing to tweak the mixer, or stacking a weird noisy tone into a weird noisy chord. Once the computer's in the loop, you can also expand the mixer's vocabulary by feeding processed sound back into the mixer. Slight frequency shifting of 0.1-10 Hz and reverb are particularly fertile. "Senseless Noise feat. Jeff" is an extremely goofy example of this setup, with me twirling the knobs and cool dude Jeff Mentch playfully tickling the default Ableton percussion synth. Also there are some blippy sine waves.

Live Coding

Using a big nasty rigid-yet-complex GUI like Ableton with the no-input mixer felt all wrong; it's nice for recording and has a powerful sampler, but it really spoils the elegance of the no-input mixer. And it demands the extra moving part of a MIDI controller or at least a mouse. As an alternative I've finally been getting into live coding. The usual way of programming involves writing code in a text file, and then running it all at once. Maybe the program halts and you inspect the result, or maybe it runs in a loop doing something until you close it. Only then can you can alter the code and try again. This is not an ideal paradigm for making music with code. A better one would be to have the program running as you write and revise it bit by bit, and have each change reflected in what you hear. Then instead of trying to program music ahead of time, the programming is the music. That's the idea of live coding. Some exponents of live coding are really into the performance angle, putting the programmer on a stage and projecting code for an audience to see.

Lately I've been learning to use two live-coding music tools, SuperCollider for audio and Tidal for patterns.

SuperCollider is three things: a flexible software synthesizer scsynth, a scripting language sclang to control it, and a development environment scide. With SuperCollider, you build synthesizers as graphs of unit generators which do simple things like generate a sine wave, delay a signal, apply a filter. Sclang can compactly do things which are tedious in a program like Max or Ableton and impossible with hardware, like "make 100 versions of this sound and spread them across the stereo field" or "randomize all the connections between these synthesizers". It does come with a steep learning curve. There are always several programming paradigms and syntax options to choose from; the separation between scsynth and sclang takes a while to wrap your head around; scsynth and scide can be temperamental; errors can be uninformative. It's enough to drive you off when you're starting out and just want to make a weird chord out of 100 sine waves. Nevertheless, I've been getting the hang of it and having a blast dreaming up wacky delay effects to use with the mixer.

Tidal is complementary to SuperCollider. It doesn't deal directly with sound, but focuses on music as a sequence of discrete events. Like Ableton Live, it enshrines pulse: everything is a loop. Very unlike Ableton, it brings all the machinery of maximally-dorky pure functional programming to bear on the definition of musical patterns. Tidal is two things: a sublanguage of Haskell, and a runtime which maintains a clock and emits OSC messages. With Tidal, you write a simple pattern as something like sound "bd sn bd sn" meaning "kick, snare, kick, snare". The part in quotes isn't a string, but Tidal's special pattern literal. You construct different rhythms using nesting and rests, e.g. "hh [hh [~ hh]]". Immediately, I liked this much better than clicking around a piano roll (though it's obviously very different from playing on a controller). You can then pattern other parameters by adding something like # pan "0 1" # gain "0.1 0.2 0.3". Each pattern can have its own meter. You can transform a pattern by e.g. changing the speed; you can superimpose or concatenate lists of patterns. And since Tidal is a language, you can do all that recursively, building complex patterns up of simple atoms, inventing and reusing processes. Tidal can send arbitrary MIDI and OSC, but it's really made to use with its own special sampler called Dirt (which has a SuperCollider implementation). I've been recording trajectories of a few minutes through computer-augmented no-input mixer space, then using Tidal+Dirt to chop them up into patterns. "Improv December 17" is made from one long mixer recording and a bass drum sample that comes with Dirt.

Using Tidal is a lot like programming in Haskell, because that's what it is. It feels natural to work with lists and higher order functions, but deeply weird when you want to do something in an imperative style. Haskell is sort of an odd match for live coding. In the middle of a groove, the last thing I want to worry about is whether to use fromRational or floor to make sure some number is the right type to play nice with a Tidal function. On the other hand, Haskell and the ingenious pattern literal make for an incredibly concise syntax. And though Haskell's strong type safety can add cognitive load I'd rather spend on musical matters, it also makes it harder to break: a mistake is more likely to be rejected by the interpreter (doing nothing) than to do something unexpected to the sound.

Tidal and SuperDirt are pretty experimental software, and there are some rough edges. Not all the documentation is there, and some things just aren't working for me. Sometimes samples don't fade in right and pop, the built in delay effect is acting bizarrely, and it can be awkward to deal with long samples. Right off the bat, I had to build the latest version to fix a fatal bug. There are some sampler features I miss like having expressive envelopes; I'd also like more flexible EQ. If I can get fluent at SuperCollider, I may try to implement some of these things myself.

So far all my SuperCollider and Tidal work is live-coded spaghetti, but eventually I hope to pack some of it into nice libraries. Stay tuned!

Side note: lately I've been using an Ubuntu desktop for music. To record, edit and master material coming out of SuperCollider, I tried out Ardour for the first time in years. I was impressed by everything until I discovered that automation curves wouldn't draw right. So close! I also tried out the free Calf plugins, which are very flexible and sound great. Seems they've been around for a while, but I never knew about them before. The multiband compressor and gate effects worked well for massaging the sound of a dense stereo recording.

Video Synthesis With Convolutional Autoencoders

2017-01-09 00:00

This project was my attempt to incorporate a neural network trained to encode images into a video feedback process.

In spring of 2015 I took a seminar in deep learning which got me real excited. Machine learning is the study of general methods for problem solving using optimization and data. Deep learning is a particular approach to ML using models with many differentiable layers of parameters. Especially interesting to me was representation learning: using ML methods to extract meaningful features from "raw" data like the pixels of an image. And I'd been talking a lot to Parag Mital and Andy Sarroff about their respective work with deep learning, sound and video. But what freaked me out the most about deep learning was the similarity between neural networks and the audio/video feedback I'd been using to make noise.

The kind of digital video feedback I'd been playing with was superficially like a recurrent neural network. At each time step, the current frame of video would be computed from the last (and optionally, the current frame of an input video). There would first be some linear function from images to images, like translation or blurring; generally, each pixel would take on a linear combination of pixels in the last frame and input frame. Then, there would be some pixel-wise bounded nonlinearity to keep the process from blowing up, like wrapping around [0, 1] or sigmoid squashing. That's the architecture of an RNN. The only difference was that rather than represent the linear transformation as a big ol' parameter matrix, I would hand-craft it from a few sampling operations in a fragment shader. And instead of training by backpropagation to do some task, I would fiddle with it manually until it had visually interesting dynamics.

I might have stopped there and tried to make my video-RNN parameters trainable. But to do what? It was pretty clear I wouldn't make much headway on synthesis of natural video in two weeks, without experience in deep learning software frameworks, and without even a GPU to run on. I wanted a toy-sized problem which might still result in a cool interactive video process. So I came up with a different approach: rather than try to train a recurrent network I would train a feedforward convolutional network, then transplant its parameters into a still partially hand-constructed video process. I came up with a neat way to do that: my CNN would be arranged as an autoencoder. It would have an hourglass shape, moving information out of 2-D image space and into a dense vector representation (which I vaguely hoped would make the network implement a "hierarchy of abstraction"). This would mean that I could bolt an "abstraction dimension" onto the temporal and spatial dimensions of a video feedback process. The autoencoder would implement "texture sampling" from the "less abstract" layer below and "more abstract" layer above. Then I could fiddle with the dynamics by implementing something like "each layer approaches the previous time-step minus the layer above plus the layer below, squashed".

I almost bit off more than I could chew for a seminar project: my approach demanded that I design and train my own neural network with caffe and re-implement the forward pass with OpenGL and spend time exploring the resultant dynamics. I was able to train my autoencoders on CIFAR with some success, and I was able to make some singular boiling multicolored nonsense. But I didn't get the spectacular emergence of natural image qualities I hoped for.

Here's the GitHub, which includes a technical writeup, a jupyter notebook with the autoencoder experiments in it, and the (probably very brittle) source code for an openFrameworks app which runs the process interactively, optionally with webcam input. It's based on early 2015 versions of caffe and openFrameworks. I may still try to get the openFrameworks app running again and capture some video, for posterity.

A few months later deep dream came out. Deep dream does a similar thing: it iteratively alters an image using a pre-trained CNN to manifest natural image qualities. The trick to deep dream is that the mechanism is the same as training the network, optimizing inputs instead of parameters. Vanilla deep dream converges, but it's simple to make a dynamic version by incorporating infinite zoom or similar. Too bad I didn't get into the filter visualization papers for this project -- I failed to realize that backpropagation could do exactly what I wanted!

~/blog

~/blog

Data Sonification Using a Cortical Representation of Sound

Making Music With No-input Mixer, SuperCollider, and Tidal

No-input Mixer

Live Coding

Video Synthesis With Convolutional Autoencoders