The V Motion Project from Assembly on Vimeo.

It’s all real – in a manner of speaking. And it’s all real-time. But just what is a live performance made with cameras, gestures, and projection? It’s worth watching The V Motion Project and pondering those possibilities, amidst the flashy visual eye candy.

It’s certainly optically impressive. It’s music made to be watched (and, in the video, filmed with iPhones and whatnot). Watch a second time, and you wonder: as we reach a new peak of maturity, decades into alternative interface design, what will come next?

To say that this is a kind of special effect is not a criticism. Spectacle is part of the message here. Instead of tweaking knobs and controls, or, indeed, fingering frets that people can’t see on a guitar, full-body Kinect performance could accurately be described as a kind of futuristic circus act. You might wonder which came first – big-gesture computer vision tracking, or “dubstep” music that distorts sounds by pushing live effect parameters to their extremes.

What it isn’t is “fake” – that is, this isn’t just someone waving arms to a pre-produced track. Ableton Live provides the soundtrack. There are loads of hours of work poured into the project, and lots of pieces, but the essential tool for working with Kinect is from developer Ben Kuper.

That tool is itself an embodiment of the maturity of Kinect development, what really began as hacks and proof of concept. Not only is a big ad firm (BBDO, anyone?) hopping onboard here, but the sophistication of mapping the actual video messages has improved. Kinect creation is still very much a matter of getting intimate with development, if you want to master the relationship of gesture and sound. But that development doesn’t have to reinvent the wheel every single time.

And this collaboration is all about the artists. From the description, some of the team involved:

This project combines the collective talents of musicians, dancers, programmers, designers and animators to create an amazing visual instrument. Creating music through motion is at the heart of this creation and uses the power of the Kinect to capture movement and translate it into music which is performed live and projected on a huge wall.
We created and designed the live visual spectacle with a music video being produced from the results. We wanted it to be clear that the technology was real and actually being played live. The interface plays a key role in illustrating the idea of the instrument and we designed it to highlight the audio being controlled by the dancer. Design elements like real time tracking and samples being drawn on as they are played all add to authenticity of the performance. The visuals are all created live and the music video is essentially a real document of the night.
Check out the tech behind the project here:
Jeff Nusz
Paul Sanderson –
Joel Little
James Hayday
Josh Cesan
Agency: Colenso BBDO
Client: Frucor

That blog post goes into excruciatingly-fine detail (in a good way). But just a few bits will make sense to Live users and Kinect hackers alike in pretty short order. Here’s the Live set itself – the song is broken into a fair number of stems, clips, and effects:

V Motion Project : Ableton Live setup from Jeff Nusz on Vimeo.

And here’s the video I actually like better than the flashy promo at top. It shows not just one neat trick, but a suite of controller tools working in harmony, each with a novel mode of interaction and graphical representation.

V Motion Project : Instrument Demo from Jeff Nusz on Vimeo.

The visuals are compelling, but I’m intrigued by the musical element precisely because it gets at the heart of the interaction.

As amazing as this looks, it also presents some challenges – or, if you like, some opportunities for future works. In order to pull off this big ensemble of controllers, the actual scheme for setting up the track becomes more rigid. For the brief here, that’s perfect: you want a human bobbing around in the midst of sci fi visuals acting out this particular song. But the price of that wonderful spectacle is the use of this as a more flexible instrument. That is, while we’ve traded in our boring knobs and faders and such, we wind up with something that’s potentially less interesting to actually play … even if it looks the business while we’re doing it. But I put that out there as a bit of a challenge, more than a criticism, and certainly many Kinect experimenters are playing with this technology to see if they can make something more playable. (As to odds of success, the jury is still out.)

Regardless, perhaps beyond the specific Kinect technology or even the style of music or interaction, what we’re seeing is a convergence of media. It’s performance that involves choreography, visuals, sound, and sensed gestural input.

It’s an impressive work. (Go, New Zealand!) It’s all-immersive, all-stops-pulled audiovisual immersion.

And pulling all those stops may be the surest way to really test what this stuff is about.

For a bit more gentle work, see The Human Equalizer, below. (And thanks to everyone who sent this in, but particularly Dave Astles and Fiber Festival’s Jarl Schulp.)

The Human Equalizer 1.0 from Bram Snijders / SITD on Vimeo.

The Human Equalizer [ T.H.E. 1.0 ] is an interactive audio-visual installation.
T.H.E.1.0 is about the ability to generate your own audio experience. A virtual point-cloud of musical data is constructed out of digital matter, combined with physical exercise.
You can visualise the installation, the field of interaction, as a multitude of buttons.
You are the musician by your x y z coordinates being T.H.E. buttons, potentiometers ,and the strings of your ”air-guitar”.
The essence of the installation is virtual point-cloud of information: integers, floats, music notes, tones, waves, vibrations and light. Your body is a variable of values which you can connect to ‘T.H.E.’ system.
[ T.H.E.1.0 was nominated for the YOUNG-BLOOD-AWARD-2011 and was exhibited at the GOGBOT festival 2011 ]
An installation by Bram Snijders
Audio infrastructure by Tijs Ham

  • Momo the Monster

    Fantastic! I love the fine line they’re toeing between game and instrument, and having an enthusiastic and dance-conscious performer goes a long way for fleshing out the aesthetic of the user interface.

  • Ronnie

    Visually, it’s just gorgeous. Well done!

  • Tim Thompson

    Wonderful work, amazing team effort!  The most interesting thing for me (as a Kinect hacker) is how they’re using two Kinects pointed in the same direction, and wiggling one of them in order to make it work – see for the description.  The description also points out that using the depth data can make for a more responsive instrument.  I’ve certainly found the depth data to be more than sufficient (by itself) to create a very playable instrument. The additional benefits of using only the depth are that you can use things other than your body to play the instrument (like a swinging tennis ball :-), and you can introduce other physical items to use as frames of reference for control, allowing more freedom in what gets projected.  See  for an example.

  • kent williams

    I’d love to see it used to do something besides play a cliched pop Dubstep song.  Technical it’s interesting, but is it really something more than an overgrown Kaoss Pad?

  • Graham

    Pretty cool, but I have to say that it really reminds me of Mandala on the Amiga and that was many many years ago. It’s pretty hard to believe that we’re just getting back to this after, what, maybe 15 years or more?

    • Gust

       THANK YOU. Beat me to it. Waving your arms in the air came and went in the 1990’s. It only takes the time for people to forget that this was already explored and away we go again. Learning curve. Dead End. Forget. Learning Curve. Dead End. Forget.

  • Samuel Process

    Ridiculous, sorry… like old stupid dance music with cheap 3D video clips !

  • James Husted

    After all the resent huff after the Deadmau5 article one would hope that this makes its way to the more mainstream acts. There would at least be SOME of the “push play” questions answered.

  • P Rayjer

    What’s next? How about we throw each other a couple of Pepsi’s at the end of the alley once the gig is done? That’s where it feels like its headed… I kinda feel like Kinect is the Autotune of the video world. It’s still waving your arms about. And since when did a musician become a dancer? Of course many musicians do dance on stage, but now the big answer to shows behind laptops and socalled boring controller knobs – is to gaff about on stage with your arms in the air? Really. The Kinect has a lot to answer for. It’s not the answer to any musical problems – good music is. And it’s pretty clear that the generic dubstep on display here will get the extremely talented techteam a solid gig on the next Pendulum tour…

  • howthebodyworks

    Attention conservation notice: Geeky tech standards rant ahead.

    Over at the blog post summarising the project –
    – the creators mention their glue of choice, and it ain’t OpenSoundControl:
    “The two computers talk to each other over UDP, sending simple JSON objects back and forth on each frame.”I’m no fan of MIDI, but OSC’s lack of evolution compared to the modern structured data standard of choice, JSON, is getting painfully apparent. Harbinger of the realtime web sweeping all before it, if you ask me, this control-via-JSON thing.

    • Peter Kirn

      Whoa, I missed that bit.

      I’ve researched JSON for this sort of application. There are two problems: one, you lose timecode. And two, JSON isn’t optimized for real-time data transmission. It’s not clear what their reasoning was for this choice, either (my guess would be familiarity).

      OSC is set for some ongoing evolution. My feeling is that JSON makes sense for things like persistent data storage (say, storing information about your performance in a file), and OSC is what you’d likely use for this sort of application.

      Given what they’re doing here, frankly, any number of data protocols could easily work.

    • howthebodyworks

      Absolutely – many protocols could work here. And I do use OSC for this kind of thing, I just don’t enjoy it.

      I do feel that as a protocol it’s falling behind better alternatives. Don’t get me wrong, JSON-over-HTTP is not the platonic ideal of a great control mechanism either. But for practical purposes, it’s better at the moment. I’d really like OSC to be better, very much so. But for various reasons it appears to be frozen in time while the Web Way Of Doing Things moves forward. I’d be very interested to hear about movements on the OSC front, so I’ll await your news on it, for sure.

      Practically, the world of interconnecting controllers is still not great, and it would be great to move it forward. If OSC loses (has lost?) momentum, and yet web technology is surging forward with dynamic, flexible, open tools, I’d be hitching my wagon to the latter, despite my qualms, because at least there the critical mass of developers and standards nuts will reduce the chance that I’ll be trapped in a standards ghetto. OTOH, I am indebted to Ross Bencina for pointing out that that HD-MIDI might be a good banner to rally behind, what with its industry backers.But then again, if there is movement on the community-driven OSC front, I’d be really happy to rally about it. All my current performance software is built around it. (Well, except for the MIDI bits…)

      For the gory, possibly OT, details of my personal beefs with OSC (including an attempt to address your concern about conflating issues, @Peter), read on. Or if we should be taking this discussion to a more on-topic forum… well, Peter, you’re the boss of here.

      OSC has at least 2 things going for it that few other protocols do – native support for timestamps, and, in principle, a simple implementation that could be handled fast even by embedded hardware. Those are good things. However, the former is only sometimes supported, and often not that useful, and the latter is purely theoretical. The latter, OSC’s simplicity, is often only an in-principle thing, not an in-practice thing. Whilst the 4-byte bundling thing makes it easy to parse, in practice there are so many edge-cases and nastiness (nested, timestamped bundles, many different types, such as varying float precisions, booleans that are different types and so on) that the OSC implementation for any tool you care to name (except possibly SuperCollider) is at best partial. Python, C, Ruby, puredata, reaktor etc, all support different subsets of the spec. If you just want to ignore the protocol’s features and treat it like high-precision MIDI, it does OK.

      But if you want to do something more complex, something that is different in kind, not precision, the advanced bits of OSC don’t really help even when they are supported. I think this is the point at which people often disagree, as this is highly specific to what kind of things you are trying to control.

      Let’s say, for example, that my exotic new tangible musical interface allows me to use Kinect to, say, construct an arbitrary FM synthesis graph, or an amazing projected sculpture constructed from arbitrary 3d objects. I’d need to transmit a complex, nested, hierarchical structure, perhaps with string keys, and untyped variables. Now, you could write a parser and serialiser to mash that structure into OSC’s native data format, lists. But then, if you’re going to parse and serialise anyway, I end up saying – sod it, I’ll serialise the damn thing to a JSON string and send that string as a blob over OSC, and then deserialise it at the other end. Then you end up thinking – well, if I’m serialising it to JSON anyway, why am I using OSC, which operates on weird ports, and prefers UDP, and thus tends to get firewalled/fail to arrive on the open internet. Why not use HTTP? From there it’s a short leap to – well, if I’m sending JSON over HTTP anyway, I guess I can now use a RESTful stack, and get well-defined CRUD-semantics, and, oh, hey, now I can talk the native language of my iOs/Android device of choice and build UIs in the browser, using many lovely, high-quality web-app UI libraries. And surely that’s a more future proof choice, as Web Audio APIs move forward and the OS moves into the browser, right?

      Add a few other annoying wrinkles in there like unknowable, variable limits on UDP (and therefore OSC) packet size, and the fact that it only supports ASCII strings (heaven forbid you collaborated with, say, a Hindi sitar player who named their sample files in devanagari script, because you can’t refer to those filenames as OSC strings) and it gets really annoying.

      Philosophically speaking, I’d say that looking at JSON-over-UDP as a relative of the realtime web only conflates the issues a little bit. Because, unlike OSC, HTTP-plus-JSON has a clean separation of concerns, and that’s a selling point. So I can send JSON over raw UDP, or over a websocket, or HTTP, or a unix socket, or a pigeon. Contrarywise,  it is OSC collapses layers 5-7 of the OSI stack into one fused monolith, and AFAICT using it, e.g. as a serialisation format or transporting it by HTTP or whatever is outside the supported spec. On the other hand, if I’m building an interaction app using HTTP, or (maybe?) Websockets, I can decide that this flakey, imprecise, possibly slow JSON implementation is not my cup of tea, and switch to Protocol Buffers, or whatever data format takes my fancy.

      I wrote a detailed rant about this, BTW, in an (abandonware) project I was working on to make, in fact, browsers talk to OSC clients. I’ve certainly ranted enough for here, though, so how’s about I just link to it, eh?

      Addendum: I just fact-checked myself there and indeed OSC 1.1 no longer presumes UDP as a transmission protocol. However, as all the OSC software I have does presume UDP as the protocol, I’ll leave my rants as-is. 

    • Peter Kirn

      Oh, and I’m now reading where your quotes are. What are you missing in OSC? I’m expecting some news to share shortly on enhancements to OSC.

      Because JSON in this case is sent over UDP, I don’t see this as having anything to do with the realtime Web – unless we’re talking websockets, but then you can also use *OSC* with the realtime Web. So I’d be careful about conflating issues.

  • duncan speakman

    i’m ‘usually’ very cynical about non tactile music systems and their promo videos, but this time I’m interested and positive . . .

    “since when did a musician become a dancer?”  – i think this is the point exactly, thinking about these systems as tools for just musicians isn’t useful in my head. ‘musical instruments’ are designed for ergonomics (generally, and of course acoustics sometimes 😉 !) –  I’ve seen a lot of systems in the past where the movements of a dancer become constrained by the nature of the system. e.g. to trigger a repeating beat they have to repeat the same move/position. This limits the potential of the dance, an inherently ‘visual’ artform. 

    What I see in this project is the ability to respond to the choreography instead – to be able to pick/select certain parts of the dancers vocabulary and decide what elements of the music are controlled. 
    Obviously this video is focused on the control/trigger by the movement, but we’re not paying attention to the fact that a musician could work in collaboration, deciding on what elements are playing when, what should be sonically manipulated by the visual work. 
    Think about putting the system in a situation like this

    so all in all I guess I’m saying that if we think about this as a novelty to make laptop shows more interesting then we’re limiting the possibilities of both the musicians and the dancers involved. Of course there are people that can do both, but doesn’t fruitful collaboration between disciplines often lead to amazing outcomes?

    ( and for the complaints about the song – give them a break,  it’s a promotional video for the system, so of course they’re going with something fashionable/poppy, –  there’s a reason the wider public has never seen the amazing systems that have come out of places like STEIM )

  • Guest

    Exactly like killing a fly with a bass cannon 😀

  • Marc Resibois

    The video is interesting. One thing you directly notice in the beginning – and that is a known fact – kinect as others have a lot of lag. So either they re-edited the sound or that guy is just playing dummy because you can’t do precise drop with such a system. It’s nice looking, the show looks nice but I’m not convinced there’s so much magic.

    • Peter Kirn

      That’s not necessarily true. The timing accuracy they get in this video is consistent with what I know to be possible. 

    • Marc Resibois

      Maybe I wasn’t specific enough.I wan’t speaking about the video feedback, that looks about right. My feeling is that the music is way to tight and perfect for being triggered through such a laggy system.

    • Peter Kirn

      No, I just don’t agree, I don’t think. The musical time is all locked to what’s going on in Ableton. The amount of lag Kinect produces is perfectly acceptable for this application, just modulating effects, etc. (It’s also determined by the kind of processing you’re doing, etc.)

  • Jesse Engel

    DDR anyone?

  • Guest

    So what I’ve understood from all this is that you can trigger clips (scenes) by waving your hands in certain directions. That’s very nice, but a bit of a boring gimmick! There is still no real interaction… 

    Now that they put a commercial music style over it, it’s supposed to suddenly be innovative/interesting?!? I saw Amon Tobin do this in his ISAM tour last year already!Why doesn’t the guy just hold two wii controllers in his hand. They can do exactly the same. Much cheaper solution ;-)Can’t Kinect recognize shapes you make with your arms/legs? That would be much more interesting to watch. The performer could really dance while deciding what to trigger. Do some “Tutting” at the same time…


    Can you imagine editing video AND sound in final cut pro using a kinect? This is going to be awesome.

  • FCM

    I understand the skepticism, here, even if it’s bit too pessimistic.  About the latency, I’m willing to bet that the rhythmic triggers are quantized, and the performer is anticipating beats ever so slightly.  The corollary is that the video was probably edited to sync up to suggest the live-ness of the effect.  Not a lie, necessarily, but not the truth, either.  (Oh God, now we’re on thin ice)  

    I think some of the gradual effects (snare builds, filter sweeps, etc) could be done in real-time.  I mean, the track is REALLY beat-tight, it just doesn’t sound like a live performer.  Compare live scratching and pad drumming to sequenced/quantized.  Now, if there were no quantize settings, and the video & audio were not nudged in post, then that would truly be a very good performance.  

    Now for some optimism.  The tron-graphics are enthralling, but they do promise a bit more future than is currently delivered by any system, today.  But this is just pointing towards the possibilities, and this is just the beginning of experimentation.  Roland’s first drum machines failed at their first purpose.  The kinnect hacks may fail at their purpose, but there is definitely something interesting going on.  

    Anyway, recurrent, breathless enthusiasm for the ‘future’ is a bit tiring, even if it will yield something down the road.  (And it’s exactly the stuff that academics live and breathe, when their grant-writing begins to infect their actual music making)  

    Buffet and Shirkey both have something to add to the conversation:

    Beware of geeks bearing formulas.

    When a technology becomes boring, that’s when the social effects become interesting. 

    • Peter Kirn

      Who’s being breathless? I think a whole lot of people here are balancing what’s compelling here and what isn’t. 

      Actually, the one thing I’m surprised no one has raised is this: there’s no reason to assume Kinect is the end point of this kind of depth-sensing tracking. I don’t think it’s unreasonable to expect a system with lower-latency processing. That’s technically very feasible; it just hasn’t been productized yet.

    • FCM

      I didn’t call anyone out on being breathless, but technology reviews DO tend to be breathless, generally speaking.  It’s zeitgeist of tech talk on the internet.  Quoting manu spec’s, rather than testing them and such.  Most of the internet is breathless about the possibilities of things, without always taking the time to flesh things out.  This is a trap that’s easy to fall into.  And so is bashing the music for being cheesy.  Sure, the video and music are cheesebiz dubschlep.  So fuckin what?  I don’t care about the music — I only care if I can reassign that parameter!!  It’s a simple trigger or a value!  But clouding that are the two poles listed above that distort any interesting conversation, in the comments here and on a lot of other online discussion.  Balanced and compelling didn’t come immediately to mind.
      But as to the quantizing – do you think his triggers are in real time?  It’s hard to find an acoustic musician who’s not a drummer to play that tightly, even hard to find a drummer, sometimes.  Dancers, who specialize in full body motion like that are never in sync in big groups, even the best aren’t synced up like musicians have to be.  Video record a string quartet from the back of a room and you will have noticeable lag that you have to correct in post.  So I’m deeply surprised to find someone who combines the two so well.  

      He’s playing in an open space, which means the sound from the speakers is going to create latency of up to 5ms or more [the acoustic space], then there’s the projections [the visual space], and then there’s the audio generation [the virtual audio space].  I mean, there is latency in any system that has DSP.  EQ introduces latency in the form of phase-shifting.  The timpanist in the back of a orchestra has latency due to acoustic distance so they play a little before the beat.  

      So lag/latency/etc is not a simple yes or no issue.  It’s a worthy question, how to deal with beat-perfect music, and latency.  And on a practical level, I think they were strategic with the boundaries that the “live performance” was allowed to live within, which was mostly the FX.  And whenever you’re editing video, it becomes all the more possible to intervene in post, etc.  

      Also, when you’re pushing the fun factor so high, as they did in the video, you do leave yourself open to explaining whether it’s ‘live’ or not, but more importantly, in which WAYS it is ‘live’.  

  • // PAPERIZED //

    I think it’s a really cool conceptual work they’ve done. And that’s all about: to find out, how the music will be produced or performed in the next few years. I recently finished an own work with kinect, ableton and processing to search for new interface and user concepts in my bachelor degree. Maybe take a look and drop me some lines what you are thinking about this. It’s another approach which is more focussed on the musician, it’s music and the human-machine-interaction than the performance itself. But to finish this comment: it’s a really good work except for the latency-issue between audio and visuals.